In response to the COVID-19 pandemic, the FDA—in collaboration with the Centers for Disease Control and Prevention (CDC), the Biodefense and Emerging Infections Research Resources Repository (BEI Resources) and the Institute for Genome Sciences at the University of Maryland and the National Center for Biotechnology Information (NCBI)—developed quality-controlled, reference sequence data for the SARS-CoV-2 reference strain for the United States.
Availability of traceable and quality-controlled data will help test developers and vaccine developers:
- Expedite development of medical countermeasures
- Identify new or more stable targets for future tests
- Enable in silico confirmation of targets
- Support development of synthetic reference material
- Enable viral population/quasi species analysis
Genomic RNA isolated from a clinical sample from the first confirmed case of COVID-19 (January 22, 2020) in the United States from Washington State yielded a subsequent SARS-CoV-2 reference material available from BEI Resources. This reference was passaged in total 4 times (Vero [3] at CDC followed by Vero E6 [1] at BEI resources).
The genomic RNA and prepared Illumina libraries were quality controlled for (1) shotgun sequencing by using the Ovation RNA-Seq System V2 kit and (2) target-capture sequencing by using the New England Biolabs Ultra II direction RNA preparation kit and H/M/R Riboreduction prior to sequencing the SARS-CoV-2 reference strain. For target-capture sequencing, the library was target-enriched using Twist custom target enrichment for COVID-19. Raw data from both sequencing approaches was used to generate the FDA-ARGOS SARS-CoV-2 reference sequence.
The FDA-ARGOS SARS-CoV-2 reference sequences (FDAARGOS_983, MT233526.1 [shotgun], MT246667.1 [target-capture]) had 100% identity to the 2019-nCoV/USA-WA1/2020 strain (GenBank accession no. MN985325.1). FDA-ARGOS SARS-CoV-2 reference sequence metadata, genome assemblies, raw data, and protocols are publicly available at the links provided below.
SARS-CoV-2 Reference Sequence Data:
- Reference sequence from shotgun data under GenBank accession MT233526.1
- Reference sequence from target-capture data under GenBank accession MT246667.1
- Metadata under BioSample accession SAMN143844141 and here.
- Raw data and protocol from shotgun sequencing under SRA accession SRX7972536
- Raw data and protocol from target-capture sequencing under SRA accession SRX7988130
- Reference material from BEI Resources catalog NR-52281 (lot 70033135)
About FDA-ARGOS
In May 2014, the FDA and collaborators established a publicly available database for Reference Grade microbial Sequences called FDA-ARGOS. With funding support from FDA’s Office of Counterterrorism and Emerging Threats (OCET) and DoD, the FDA-ARGOS team are initially collecting and sequencing 2000 microbes that include biothreat microorganisms, common clinical pathogens and closely related species.
The FDA-ARGOS genomes meet the quality metrics for reference-grade genomes for regulatory use. FDA-ARGOS reference genomes have been de novo assembled with high depth of base coverage and placed within a pre-established phylogenetic tree. Each microbial isolate in the database is covered at a minimum of 20X over 95 percent of the assembled core genome. Furthermore, sample specific metadata, raw reads, assemblies, annotation and details of the bioinformatics pipeline are available.
Manufacturers who develop sequence-based test to identify infectious agents and/or to detect resistance or virulence markers can use FDA-ARGOS to advance their development programs and to support the regulatory science review of such test. For example, FDA-ARGOS can be used as a tool for in-silico (computer simulation) data analysis.
Contribute SARS-CoV-2 Samples to FDA-ARGOS for Free Sequencing and Analysis: Database for Reference Grade Microbial Sequences (FDA-ARGOS)