Main

Infections with human coronavirus 2019 (HCoV-19)1,2, named as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by the International Committee on Taxonomy of Viruses3, can result in coronavirus disease 2019 (COVID-19), characterized by various clinical outcomes from asymptomatic infections to severe pneumonia and even death4,5. Globally, as of 28 February 2023, more than 758 million confirmed cases and more than 6.8 million deaths have been reported (https://covid19.who.int).

Human individuals with COVID-19 were first reported in late December 2019, in Wuhan, China, as pneumonia of unknown aetiology. A certain proportion of of these early cases were found to be linked to the Huanan Seafood Market (HSM) in Wuhan4,6, where various animal meats, exotic seafood and live animals were available for purchase. The HSM has been suspected to be the source of the COVID-19 pandemic7. Not all of the early human cases had epidemiological links to the market6,8, and alternative hypotheses for the market association (for example, entry of the virus into the market through humans or the cold chain) also exist.

SARS-CoV-2 has high similarity to a few coronaviruses derived from bats in Asian countries including China, Laos, Japan, Cambodia and Thailand, and some scientists have proposed that bats might be the original source of SARS-CoV-2 (refs. 1,8,9,10,11,12,13,14). Whether another animal might have acted as an intermediate host to facilitate virus spillover from bats to humans is still unknown15,16. An important finding was the discovery of SARS-CoV-2-related coronaviruses from pangolins, in which the spike proteins contained receptor-binding domains showing high similarity to the receptor-binding domain of SARS-CoV-2 (refs. 17,18,19). Pangolins might be involved in the ecology of coronaviruses, but whether they are the intermediate host for SARS-CoV-2 is unknown, given the current data20. A recent study documented the animal species in the HSM between May 2017 and November 2019 and noted that no pangolins or bats were present, but some animals proposed to be susceptible to sarbecoviruses, such as raccoon dogs, were present21. Thus far, the origins of SARS-CoV-2 (refs. 22,23) and the role of the HSM in the origins and spread of SARS-CoV-2 remain unclear. The data from the HSM may provide important information.

The HSM is located in the Jianghan District in the downtown area of Wuhan, the capital city of Hubei Province, and is approximately 800 m away from Hankou Railway Station, a major railway travel hub. It occupies >50,000 m2, with 678 stalls located close to each other in extremely crowded conditions (Fig. 1a). The market is separated into two zones, the East and West Zones, with seafood and animals mainly sold in the West Zone and livestock meat sold in the East Zone. Among the 678 stalls of the market, 10 stalls selling domesticated wildlife (1.5%) were identified according to sale records24, located in the southwestern corner of the West Zone (8/10) and the northwestern corner of the East Zone (2/10; Fig. 1a). According to sale records, during late December 2019, animals or animal products were sold in these 10 animal stalls. Animals included snakes, avian species (chickens, ducks, geese, pheasants and doves), sika deer, badgers, rabbits, bamboo rats, porcupines, hedgehogs, salamanders, giant salamanders, bay crocodiles, Siamese crocodiles and so on, among which snakes, salamanders and crocodiles were traded as live animals (described in detail in ref. 24).

Fig. 1: The distribution of the positive environmental samples in the HSM.
figure 1

a, As the place of the early cluster of patients with COVID-19, the HSM is separated into East and West Zones with the Xinhua Road between them. To detect the presence of SARS-CoV-2 RNA, RT-qPCR was carried out. The locations of the positive samples are marked in the map of the market in orange, and the locations of the samples that the live viruses were isolated from are filled red. The map also shows locations of stalls where domesticated wildlife products were sold. b, Timeline of environmental and animal samples collected within and around the HSM. The data for confirmed cases up to 31 December 2019 were taken from ref. 24.

The market was closed in the morning of 1 January 2020, shortly after the identification of the pneumonia of unknown aetiology. On the same day, in the early morning, the Chinese Center for Disease Control and Prevention (China CDC) dispatched an epidemiological team, together with experts from Hubei Provincial CDC and Wuhan Municipal CDC, to the HSM to collect environmental samples and study the potential introduction of SARS-CoV-2 into the market (Fig. 1b). From 1 January 2020 until 2 March 2020, a total of 923 environmental samples from different locations within and around the market and 457 animal samples, including dead animals in refrigerators and freezers and stray animals and their faeces, were collected, with some stray animals sampled until 30 March (Extended Data Tables 13 and Supplementary Table 1). After the closure of the market, the outside surface of the rolling shutter doors of the stalls and the corridors were disinfected (with 1% bleach mixed with water) throughout January and February 2020. The goods inside the stalls were completely cleared and disinfected until early March 2020.

Out of the 923 environmental samples collected in and around the HSM, 74 were found by the quantitative real-time polymerase chain reactions (RT-qPCR, 70 positive samples) and high-throughput sequencing (Bowtie2 analysis, 4 positive samples with non-3’ poly-A reads) to be positive for SARS-CoV-2 with a positivity rate of 8.0%. Cycle threshold (Ct) values for the RT-qPCR ranged from 23.9 to 41.7 (Supplementary Table 2). Among the 828 samples from inside the HSM, 64 samples (7.7%) were positive. Of the 64 SARS-CoV-2-positive samples collected inside the HSM, 87.5% (56/64) were collected in the West Zone of the market, particularly in streets 1 to 8, with 71.4% (40/56) positive samples identified herein (Fig. 1a). Among the 14 samples from warehouses related to the HSM, 5 tested positive. This may reflect the nature of SARS-CoV-2 presence in the market during the early phase of the outbreak. Among the 51 samples from sewerage wells (Supplementary Table 1) in the surrounding areas outside the HSM, 4 tested positive (Supplementary Table 2). Notably, 1 sample (Env_0601), a floor surface swab, out of the 30 environmental samples collected from Dongxihu Market in Wuhan on 22 January 2020 also tested positive (Supplementary Table 2 and Extended Data Table 4).

Of the 110 samples collected from sewers or sewerage wells in the market, 24 samples were positive for SARS-CoV-2 nucleic acid. All four sewerage wells in the market tested positive. During the onsite investigation of the overground drainage pathway in the HSM, we found that the wastewater in the overground drainage led into the underground drainage inside the market and then flowed into the wells on the edge of the market. We then did a spot-check sampling across all of the overground drains according to the principles described in the Methods (Extended Data Fig. 1). Excreta of the upper respiratory tract of infected humans and the potential animal waste would be mixed together into the overground drainage. Thus, these data suggested either that infected people and/or animals in the market contaminated the sewage or that the contaminated sewage may have had a role in furthering the virus transmission within the case cluster in the market.

The merchants’ activities were assessed against the RT-qPCR results of the environmental samples. The sampling covered 19.8% (134/678) of the shops in the market (95% confidence interval (CI): 16.8–23.0%). Of the positive samples, 44 were distributed among 21 shops in the market, 19 of whom were located in the West Zone with the remaining 2 located in the East Zone (Fig. 1a). Some vendors sold more than one type of product. Although the results provided some indication of an association of cases with different products, no significant differences were observed between different types of shop, including those selling poultry (22%, 8/37, 95% CI: 9.8–38.2%), cold-chain products (18.4%, 16/87, 95% CI: 10.9–28.1%), aquatic products (17.8%, 13/73, 95% CI: 9.8–28.5%), livestock (14%, 5/36, 95% CI: 4.7–29.5%), seafood products (11%, 6/56, 95% CI: 4–21.9%), wildlife products (11%, 1/9, 95% CI: 0.3–48.2%) and vegetables (25%, 2/8, 95% CI: 3.2–65%; Extended Data Fig. 2 and Extended Data Table 5). The detection of SARS-CoV-2 in several shops selling different product types suggested that SARS-CoV-2 may have been circulating in the market, especially in the West Zone, for a while in December 2019, leading to an extensive distribution of the virus within the market, which may have been facilitated by the crowded buyers and the contaminated environment.

The 457 animal samples included 188 individuals belonging to 18 species (with some stray animals sampled until 30 March; Extended Data Table 6). The sources of the samples included unsold goods kept in refrigerators and freezers in the stalls of the HSM, and goods kept in warehouses and refrigerators related to the HSM. Three Chinese giant salamanders, which were found in a fish tank, were alive and swab samples were collected and tested. Samples from stray animals in the market were also collected, comprising swab samples from 10 cats, 27 samples of cat faeces, 1 dog, 1 weasel and 10 rats. All of the 457 animal samples tested negative for SARS-CoV-2 nucleic acid.

To determine whether there was live virus in the HSM, we inoculated 27 SARS-CoV-2-positive environmental samples collected on 1 January 2020, into cell lines, including Vero E6 and Huh7.5 cells. Cytopathic effects were observed 3 days post inoculation with sample Env_0313 on Vero E6 cells. Cytopathic effects were also observed 5 days post inoculation on Huh7.5 cells. The electron micrographs of Vero E6 cells at 5 days post inoculation showed that virus particles were present in both the supernatant and the cells. Negative-stained virus particles and ultrathin cultured cell sections showed typical coronavirus morphology (Fig. 2). Live viruses were isolated from samples Env_0313, Env_0354 and Env_0126, which were the only three samples with Ct values < 30 in the RT-qPCR. Env_0354 and Env_0126 were two swab samples from the ground and Env_0313 was swab samples from a wall. Notably, samples Env_0313 and Env_0126 were from stalls with confirmed cases. All of the results of successful virus isolation from the original samples with low Ct values revealed the existence of live SARS-CoV-2 with high titres in the environment of the HSM. Owing to the high Ct values, we did not attempt virus isolation from the samples collected at later time points.

Fig. 2: SARS-CoV-2 virus isolation from environmental samples of the HSM.
figure 2

ad, Electron micrographs of the SARS-CoV-2 viruses isolated from the environmental samples in the HSM. To determine whether SARS-CoV-2 particles could be visualized from the cell supernatant and lysate, we used transmission electron microscopy to observe the culture supernatant and ultrathin sections from Vero E6 and Huh7.5 cells. The electron micrographs showed that virus particles were present in both the supernatant (a,b) and the cells (c,d). Negative-stained virus particles were generally spherical, pleomorphic and 60–140 nm in diameter. Spike protrusions were observed around the particles in a crown (corona) shape (a,b). In ultrathin cultured cell sections, a group of virus particles can be seen outside the cell (c), and sheets of virus particles can also be observed inside the cells (d). The micrographs are representatives of repeated experiments.

During later sampling in the HSM in February, we collected samples to investigate the virus RNA persistence in the market. Some of these samples tested positive, particularly those from the sewage well and even the walls (Supplementary Table 2). Of the 70 RT-qPCR-positive samples, 36 samples (27 within the HSM and 9 from the surrounding area) collected in February were still positive for SARS-CoV-2. The long persistence of its genetic material in the environment might reflect high levels of environmental contamination before the market was closed. For sample Env_0838, collected from a wall on 20 February 2020, a 3-plex RT-qPCR test was carried out. The viral RNA segment was undetectable in one RT-qPCR channel targeting the N gene, but could be amplified in the other two channels targeting the RdRp and E genes, with Ct values of 32.59 and 37.34, respectively. This result is reasonable considering the degradation of the viral genome. However, the results also indicate a long persistence of the viral RNA in the environment.

We further carried out high-throughput sequencing (Supplementary Table 3) and successfully obtained seven complete or near-complete SARS-CoV-2 genome sequences, including three sequences from three environmental samples (Env_0313, Env_0354 and Env_0020), and four sequences from cell supernatants of Env_0313, Env_0354 and Env_0126 (Fig. 3 and Supplementary Table 4). A few samples were resequenced using a multiplex PCR approach, including Env_0020_seq01, Env_0313_seq04, Env_0313_seq05, Env_0126_seq06 and Env_0354_seq07 (Supplementary Tables 3 and 4). The genome sequences of three environmental samples, Env_0126, Env_0313 and Env_0354, were found to be identical to the reference strain HCoV-19/Wuhan/IVDC-HB-01/2019 (IVDC-HB-01, Global Initiative on Sharing All Influenza Data accession number: EPI_ISL_402119) and the human strain Wuhan-Hu-1 (GenBank: NC_045512; Fig. 3a). The genome sequence of the isolated virus from the environmental sample Env_0354 had two synonymous mutations compared to HCoV-19/Wuhan/IVDC-HB-01/2019, with sequence identity of 99.99% (Fig. 3a). Therefore, the SARS-CoV-2 sequences from environmental samples were highly similar to the clinical strains obtained during the early stages of the COVID-19 outbreak.

Fig. 3: Genomic and phylogenetic analyses of SARS-CoV-2 virus genomes from the HSM.
figure 3

a, Sequence comparison of the full-length SARS-CoV-2 genomes in the environmental samples. b, Phylogenetic analysis of full-length SARS-CoV-2 genomes from the HSM and representative strains from the early stage of the COVID-19 pandemic.

SARS-CoV-2 has been proposed to be classified into two main lineages based on the two highly linked single nucleotide polymorphisms: A lineage (8782T and 28144C, or S lineage in another nomenclature for SARS-CoV-2) and B lineage (8782C and 28144T, or L lineage). It has been proposed that the A lineage is most probably the ancestral lineage, because all of the SARS-CoV-2-related coronaviruses from bats and pangolins possessed 8782T and 28144C (refs. 25,26); Pekar et al. suggested that the two lineages may represent separate introduction events27. Phylogenetic analysis revealed that most of the environmental strains belong to the B lineage and they cluster together with the human strains circulating in the early stage of the pandemic (Fig. 3b and Supplementary Fig. 1). The phylogenetic analysis did not involve the environmental sample Env_0020, the A lineage of which was confirmed by the high number of reads mapped to positions 8,782 and 28,144 in Env_0020 (Supplementary Table 5). However, it should be noted that the genome of Env_0020 is of low quality and there are many discontinuous gaps in the assembled genome. Indeed, although it is difficult to root the SARS-CoV-2 phylogenetic tree, our analysis indicated that the environmental viruses clustered together with the human strains circulating in the early stages of the pandemic.

We conducted RNA-sequencing (RNA-seq) analysis using 57 SARS-CoV-2 RT-qPCR-positive and 115 SARS-CoV-2 RT-qPCR-negative environmental samples from the HSM (Fig. 4a and Supplementary Table 3), in which the bias of sampling and RNA-seq should be considered. We used two approaches for identification of genera. The Kraken2 method with all available genes and genomes in the database was used for the identification of all genera, including those of Bacteria, viruses, Eukarya and Archaea. Additionally, the barcoding method using mitochondrial cytochrome c oxidase subunit sequences was used specifically for the identification of Chordata genera. Bacteria were the most abundant species in almost all samples and mammal species could be found in most samples, fitting the features of samples collected from the environment (Fig. 4b and Supplementary Tables 6 and 7). Gallus, Homo, Anas, Sus, Bos and Canis could be detected in most samples (Fig. 4c and Supplementary Table 8), in accordance with the environmental features of the seafood markets in China. We analysed the mammalian genera in all sequenced samples with Kraken2 (detailed in the Methods) using different thresholds. A total of 70 mammal genera, which existed in more than 2% of samples, were identified with a threshold of 100 reads per million (Fig. 4d). It is important to highlight that the results of the Kraken2 analysis (Fig. 4d) and the barcode of life data (BOLD) analysis (Extended Data Fig. 3) differ. In particular, the proportion of reads assigned as raccoon dog differs considerably between the two methods. This may be due to the heterogeneity of the reference data used by the two methods (mitochondria for BOLD; whole genome for Kraken2). It should be noted that the genera identified using current approaches might be updated with additional reference genomes. As such, this list is not definitive and further in-depth analysis with other methods will be required to provide more precise information regarding the wildlife species present at the market. In particular, it should be pointed out that our approach probably returned some false-positive assignments, particularly with the less-abundant genera (for example, Ailuropoda).

Fig. 4: Analysis of environmental samples in the HSM.
figure 4

a, Schematic illustration of the experimental design. All 70 SARS-CoV-2-positive samples by RT-qPCR were included for RNA-seq. A total of 57 RNA-seq libraries were successfully constructed. Additionally, RNA-seq libraries of 115 SARS-CoV-2-negative samples passed library quality control. Kraken2 was used for genus classification. Kraken2 and the BOLD system were used for genus classification of Chordata. b, Heatmap showing the read distribution for four domains (Bacteria, Eukarya, viruses and Archaea), the Homo genus and the SARS-CoV-2 species, for SARS-CoV-2 RT-qPCR-positive or SARS-CoV-2 RT-qPCR-negative samples. c, Positive ratio of the illustrated genus in all tested samples. Top-ranked genera within the Chordata phylum are shown. d, Illustration of mammal genera in the market using the threshold of 100 reads per million based on Kraken2. The samples are grouped by SARS-CoV-2 RT-qPCR result and the NGS results analyzed with Bowie2. The blue bars indicate positively detected genera. e, Illustration of mammal genus distribution in samples with a high viral load. Data for Env_0020, Env_0313, Env_0354 and Env_0126 are shown. f, Distribution of the positively detected mammal genera in the market. Samples from four areas where multiple SARS-CoV-2 RT-qPCR-positive samples were detected are shown. The distribution of top mammal genera in each area is shown.

In particular, we analysed three samples (Env_0126, Env_0313 and Env_0354) collected on 1 January 2020 with high levels of SARS-CoV-2 (Ct value < 30; Fig. 4e). The identified mammal genera in the Env_0313 and Env_0354 samples were related to species in the general food market, such as Homo, Ovis, Bos, Canis, Sus and Felis. Many mammalian genera were observed in the Env_0126 sample, but the most abundant mammalian genera were also related to the general food market, including Bos (77.30%), Ovis (19.91%), Homo (0.77%) and Bubalus (0.57%). Pipistrellus (0.002%) and Lutra (0.001%) were also found in this sample, but at extremely low relative abundances, raising the possibility of false detection. Moreover, we also noted that only Homo, Ovis, Bos and Sus reads but not species related to wildlife were found in the Env_0020 sample, the one with A lineage.

We illustrated the top-ranked genera in four areas of the market where multiple SARS-CoV-2 RT-qPCR-positive samples were detected. As shown in Fig. 4f, the top-ranked genera in these areas were Homo or other genera that generally exist in food markets. We also noted that Nyctereutes could be found in shop 25 of street 8, and Atelerix and Erinaceus could be found in shops 15–17 of street 7 (Fig. 4f). These genera were detected in both SARS-CoV-2-positive and SARS-CoV-2-negative samples, and actually more often in negative ones (Supplementary Tables 69); thus, conclusions about whether these animals were infected with SARS-CoV-2 cannot be drawn.

We checked samples that might relate to wildlife, such as samples collected in the defeathering machine and areas with visible blood spots. The most abundant mammal genus of the defeathering machine sample (Env_0584) was Canis (Extended Data Fig 3). The most abundant mammal species of the visible blood spot sample (Env_0262) were Bos, Sus, Ovis and Bison, respectively (Extended Data Fig. 3). Additionally, we plotted the distribution of some genera of concern, including Myotis, Erinaceus, Mustela, Nyctereutes, Rhizomys, Meles and Melogale. Most of these samples were distributed in the West Zone of the market (Extended Data Fig. 4), where wildlife products were sold, but this also reflects the zone was much more intensively sampled and analysed by RNA-seq. The distributions of Homo, Sus, Bos, Gallus and Anas were also dominant in this area, which was near the areas enriched in SARS-CoV-2 RT-qPCR-positive samples. The repeated sampling of the locations with RT-qPCR-positive results may contribute some bias to the distribution analyses of areas enriched in SARS-CoV-2 RT-qPCR-positive samples. Additionally, we plotted the proportions of mammal genera in those SARS-CoV-2-positive samples with a high abundance of genera related to wildlife, such as Env_0576 (Nyctereutes enriched), Env_0807 (Lariscus enriched), Env_0809 (Erinaceus enriched) and Env_0585 (Erinaceus enriched; Extended Data Fig. 3).

Of particular note was the difference in the results from RT-qPCR and next-generation sequencing (NGS). As the RT-qPCR detection assay used in the very early stage of the pandemic was not formally verified, we believe that there may be some false positives and false negative in the RT-qPCR detection results in this study. We also found that SARS-CoV-2 reads could be detected by NGS in a portion of SARS-CoV-2 RT-qPCR-negative samples, possibly owing to degradation of SARS-CoV-2 within the RT-qPCR target region or contamination during library building. Additionally, we observed a relatively higher positivity rate when aligning the reads to the reference SARS-CoV-2 genome with Bowtie2 (Supplementary Table 6). Therefore, more precise algorithms are required to better capture the reads of SARS-CoV-2 RNA.

In summary, we report the detection of SARS-CoV-2 RNA and live virus in environmental samples from the West Zone of the HSM. It should be noted that the selection of shops for sampling was biased because shops selling wildlife as well as shops linked to early cases were prioritized for sampling. The origin of the virus cannot be determined from the analyses available so far. Although gene barcode analysis of animal species in the study suggested that Myotis, Nyctereutes and Melogale—species that have been recognized as potential host species of sarbecoviruses—were present at the market, these barcodes were mostly detected within the SARS-CoV-2 RT-qPCR-negative samples from the environment. It remains possible that the market may have acted as an amplifier of transmission owing to the high number of visitors every day, causing many of the initially identified infection clusters in the early stages of the outbreak24.

Recent reports traced the outbreak back to the HSM and proposed, after compiling information reported by various sources, including the Joint WHO-China Study and social media, that the market sold live wild animals as recently as 2019 (ref. 28). Another report proposed that SARS-CoV-2 spilled over from animals to humans at least twice in November or December 2019, and the raccoon dog was suggested to be the intermediate host animal27. The evidence provided in this study is not sufficient to prove such a hypothesis. Our study confirmed the existence of raccoon dogs, and other potential SARS-CoV-2-susceptible animals, at the market before its closure. However, these environmental samples cannot prove that the animals were infected. Furthermore, even if the animals were infected, our study does not rule out human-to-animal transmission, as the sampling was carried out after the human infection within the market6. Thus, the possibility of potential introduction of the virus to the market through infected humans, or cold-chain products, cannot yet be ruled out.

More work, involving internationally coordinated efforts, is needed to investigate the potential origins of SARS-CoV-2 (ref. 24). Surveillance of wild animals should be enhanced to explore the potential natural and intermediate hosts for SARS-CoV-2 (refs. 7,29), if any, which would help to prevent future pandemics caused by coronaviruses of animal origin.

Note added in proof: The original, unedited Accelerated Article Preview (AAP) version of this Article contained some errors, which have been corrected in the final proof. In the AAP version, we provided the results of analysis performed in early 2020 and reported that SARS-CoV-2 was detected in 73 of the 923 environmental samples by RT–PCR. However, this was incorrect. SARS-CoV-2 had been detected in 70 of the samples by RT–PCR, and SARS-CoV-2 reads had been detected in an additional 3 samples (Env-0552, Env-0576 and Env-0585) by next-generation sequencing (NGS) followed by mapping the reads onto the reference genome NC_045512 using Bowtie2 (reads ≥ 1). At that time, in early 2020, only a few samples had been sequenced. We have now updated the Article to reflect the full set of sequencing results from all 172 samples that had sufficient RNA abundance for NGS and have indicated in the paper that 74 samples tested positive for SARS-CoV-2: 70 by RT–PCR and an additional 4 by NGS (using Bowtie2 analysis). Supplementary Table 1 has been modified to indicate which samples tested positive by RT–PCR or by Bowtie2 analysis. Supplementary Tables 2 and 5 have been revised to indicate that Env_0333 (F33) and Env_0509 (G93) did not pass quality control and NGS was not performed on these samples. We have also clarified in Supplementary Table 2 that samples Env_0552, Env_0576, Env_0585 and Env_0788 were negative by PCR but positive by sequencing and Bowtie2 analysis. Supplementary Table 6 has been revised to indicate which 33 samples were processed using the HWTSC002-16-BGI human rRNA depletion kit (BGI) and to include SARS-CoV-2 reads per sample. In the AAP version, we also incorrectly stated that sample Env_0354 (F54) was collected from a stall associated with a confirmed human case, instead of sample Env_0126 (B5). Consistent with Fig. 1a, samples Env_0313 (F13) and Env_0126 (B5), but not Env_0354 (F54), were from stalls with confirmed cases. We have also added further information to the Methods section to clarify details of the culture methods and sample processing for metagenomic sequencing.

Methods

Sample collection

The HSM was closed in the early morning of 1 January 2020, and at the same time, the China CDC began collecting environmental and animal samples. Staff from the China CDC entered the market about 30 times before the market’s final clean-up on 2 March 2020, with some stray animals sampled outside the market until 30 March. Samples in the HSM were collected to represent as exhaustively as possible, from a wide diversity of surfaces, animals and products (Supplementary Tables 1 and 2 and Extended Data Table 6) according to different sampling principles, as described in detail in ref. 24.

The principles and ranges of in-market sampling covered: environmental samples from stalls related to early cases; environmental samples from doors and floors of all stalls in the blocks where the early cases were located; environmental samples, collected by block, from the East Zone of the market; transport carts, rubbish bins and similar objects; environmental samples from stalls that sold livestock, poultry or farmed wildlife (also referred to as domesticated wildlife or domesticated wildlife products); samples of sewage and silt from drainage channels and sewerage wells; stray cats, rats and other stray animals in the market; animal products and other commodity samples kept in cold storage and refrigerators in the market; the market’s ventilation and air-conditioning system; and public toilets, public activity rooms and other places where people gathered in the market.

The investigators used full personal protective equipment during the sampling in the market. Commercially obtained swabs and virus preservation solution were used for the sampling (Disposable Virus Sampling Tube, V5-S-25, Shen Zhen Zi Jian Biotechnology). For environmental samples, sampling swabs were used to swab the floors, walls or surfaces of objects and then preserved in virus preservation solution.

For animal samples, depending on the type of animal and whether it was alive or frozen, pharyngeal, anal, body surface and body cavity swabs or tissue samples were collected for RT-qPCR. Generally, for live animals and frozen full bodies, three samples, including pharyngeal, anal and body surface swabs, were collected for each individual animal. For animal bodies after ‘bai tiao’ preparation (remaining parts of poultry or livestock after removal of hair and viscera), body cavity swabs were collected.

Drain samples were collected using virus sampling swabs to probe into the silt at the bottom of drainage channels in the market. Wastewater and silt samples were preserved in virus preservation solution. For the sewage well (for the drain water), a container was used to take a silt–water mixture from a location near the bottom of the well, and an appropriate amount of sample was collected by using virus sampling swabs and then preserved in virus preservation solution.

Nucleic acid extraction and SARS-CoV-2 RT-qPCR assay

A virus nucleic acid extraction kit (Xi’an Tianlong) was used to extract viral nucleic acid from samples using an automated nucleic acid extraction instrument according to the manufacturer’s instructions. RT-qPCR was carried out on extracted nucleic acid samples with a SARS-CoV-2 nucleic acid assay kit. The reagent brands used include BioGerm (40/38; cycle number/cutoff value), DAAN (45/40) and BGI (40/38).

Virus isolation

Virus isolation was carried out in a biosafety level-3 laboratory in the National Institute for Viral Diseases Control and Prevention, China CDC. Samples positive for SARS-CoV-2 RT-qPCR collected on 1 January 2020 were cultured in both Vero E6 and Huh7.5 cells on 11 January 2020. The cells were cultured in 24-well cell culture plates with DMEM basal medium containing 10% fetal bovine serum and 1% penicillin-streptomycin in an incubator containing 5% CO2. Homogenate supernatant was inoculated when the monolayer cell culture was about 90% confluent and adherent to the wall. The medium used was DMEM basal medium containing 2% fetal bovine serum. Three blind passages were carried out for each sample. The growth and morphological changes of the cells were observed under a microscope every day. The culture supernatant and cell pellet of each passage were collected for RT-qPCR. The morphology of viral particles in the cell sections and the supernatant were firstly observed by transmission electron microscopy, on 22 January 2020.

Metagenomic sequencing

Metagenomic sequencing was conducted at the National Institute for Viral Disease Control and Prevention, China CDC and Wuhan BGI. Nucleic acid was extracted using Qiagen’s viral RNA microextraction kit. An enrichment kit (HWTS-C002-16-BGI, BGI, China) was used on 33 samples to improve the sensitivity of viral RNA detection. The kit is based on a probe pool that targets the human ribosomal RNA sequence. The probe pool comprises multiple oligonucleotide fragments, and viral RNA enrichment is accomplished through a sequence of steps including probe hybridization, RNAse H digestion, DNAse I digestion and magnetic bead purification. This specific treatment was chosen based on the low CT values (<30) of internal control (human genes) observed in these samples, indicating a relatively high abundance of human genes. However, the remaining samples did not undergo this treatment. Extracted RNA was reverse transcribed into cDNA and segmented into 150–200 base pairs by enzyme digestion. After repair, fitting, purification, PCR amplification and purification, the sample concentration was assayed by DNBSEQ-T7, and an average output of more than 200 million reads was obtained. Sequencing data were compared with those in a SARS-CoV-2 database to determine whether the samples contained SARS-CoV-2 sequences. For the seven complete SARS-CoV-2 genome sequences, three sequences from environmental samples (Env_0020_seq01, Env_0313_seq02 and Env_0354_seq03) were obtained from DNBSEQ-T7, and four sequences from cell supernatants of Env_0313, Env_0354 and Env_0126 (Fig. 3) were obtained from the NextSeq 550 platform. A few samples were resequenced using a multiplex PCR approach, including Env_0020_seq01, Env_0313_seq04, Env_0313_seq05, Env_0126_seq06 and Env_0354_seq07 (Supplementary Tables 3 and 4), as described previously30. Briefly, the nucleic acid was extracted using Qiagen’s viral RNA microextraction kit. The multiplex PCR comprised a set of 102 oligonucleotide primer pairs and the amplicons generated by the primer pairs spanned the target genome. All raw data related to the genomes, including any partial genomes that were sequenced, were fully reported and deposited to the public database (Supplementary Tables 3 and 4).

Virus genome assembly and phylogenetic analysis

Raw reads were adaptor- and quality-trimmed with the Fastp (version 0.20.0) program. The clean reads were mapped to the SARS-CoV-2 reference genome (GenBank: NC_045512) using Bowtie2. The assembled genomes were merged and checked using Geneious (version 11.1.5) (https://www.geneious.com). The coverage and depth of genomes were calculated with SAMtools (version 1.10) based on SAM files from Bowtie2.

Reference genomes, IVDC-HB-01 (Global Initiative on Sharing All Influenza Data: EPI_ISL_402119) and Wuhan-Hu-1 (GenBank: NC_045512), were used as a query. Multiple sequence alignment of the SARS-CoV-2 sequences obtained from this study and reference sequences were carried out with Mafft (v7.450). Phylogenetic analyses were carried out using RAxML v8.2.9 with 1,000 bootstrap replicates, using the GTR nucleotide substitution model and the Gamma distribution.

Bioinformatic analysis of the species abundances

Kraken2 (version 2.1.2)31 was used for species classification with the option --confidence 0.1. Sequences of all species in the Nucleotide (nt) database were used for generating the index. bracken (version 2.5) was used for re-evaluating species abundance. The matrix of species was obtained by using the pavian algorithm32. The ggplot2 package in R was used for plotting. Read counts of each genus were used for further analysis and plotting. Raw counts for four domains (Archaea, viruses, Eukarya and Bacteria), SARS-CoV-2 and the Homo genus were used to generate a heatmap (Fig. 4b). Two-tailed unpaired t-test was used for identification of differential genus between SARS-CoV-2 RT-qPCR-positive and SARS-CoV-2 RT-qPCR-negative samples.

For the analysis of the Chordata genus characterization, the reference was generated using the sequence of mitochondrial cytochrome c oxidase subunit I in the BOLD system33,34,35. RNA-seq samples were mapped to the reference sequences by the Bowtie2 (ref. 36) algorithm with the default settings. Read counts of each genus were calculated by samtools37. Read counts exceeding 20 were used as a cutoff for the identification of positively enriched genus. Fisher’s exact test was used for comparing the differential genus in the Mammalia class between SARS-CoV-2 RT-qPCR-positive and SARS-CoV-2 RT-qPCR-negative samples.

Ethics

The sample collection was determined by the China CDC to be part of the emergency response to the outbreak of pneumonia of unknown aetiology and therefore was exempt from institutional review board assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.