Significant reductions in the cost of genome sequencing in the past decade have ushered in the era of “Big Data,” where biologists collect immense datasets, seeking patterns that may explain important diseases or identify drug and vaccine targets. But to be useful this deluge of data must be organized, maintained and made accessible to researchers.
Since 2000, a team led by University of Pennsylvania and University of Georgia scientists has been responsible for developing genome database resources for microbial pathogens, including the parasites responsible for malaria, sleeping sickness, toxoplasmosis and many other important diseases.
The National Institute of Allergy and Infectious Disease (NIAID) has awarded the institutions a new contract for 2014-15 worth $4.3 million to continue this important work. If all option years are exercised, the five-year award value is expected to total $23.4 million.
The contract supports the Eukaryotic Pathogen Genomics Database, or EuPathDB. By providing the global scientific community with free access to a wealth of genomic data related to pathogens important to human health and biosecurity, EuPathDB expedites biomedical research in the lab, field and clinic, enabling the development of innovative diagnostics, therapies and vaccines.
One of four Pathogen Bioinformatics Resource Centers, or BRCs, supported by the National Institutes of Health, EuPathDB encompasses disease-causing eukaryotes, which are organisms that possess a membrane-bound nucleus. Other BRCs support data on viruses, bacteria and insect vectors of disease.
Since its prototype was launched in 1999, EuPathDB has become increasingly complex and now comprises about nine terabytes of data and has been cited more than 8,000 times in the scientific literature.
The latest contract is the third time that the National Institutes of Health has awarded support to EuPathDB, building on previous contracts issued in 2004 and 2009, as well as prior grant funding from the NIH and the Burroughs Wellcome Fund. Affiliated projects have also been supported by the Wellcome Trust,U.K., the Bill & Melinda Gates Foundation, the Sloan Foundation, the World Health Organization, the U.S. Department of Agriculture, the Brazilian government and other organizations.
Plasmodium species are responsible for malaria, causing an estimated 200 million illnesses and 600,000 deaths each year. These parasites were among the first to be integrated into EuPathDB, but the database has since expanded greatly, leveraging core infrastructure supported by the NIH contract to incorporate more than 3,000 genomes from more than 300 species.
Others include important threats to public water supplies, such as Cryptosporidium, Entamoeba and Giardia; Toxoplasma gondii, a parasite responsible for neurological disease in infants and immunocompromised adults; Trichomonas; and numerous other important fungal and agricultural pathogens.
“It is truly inspiring to see how access to these on-line resources has helped to invigorate and engage scientific colleagues around the world,” Roos said. “EuPathDB occupies a large global footprint.”
While NIH funding supports core infrastructure, additional partners have helped to expand the project’s reach. For example, the Bill & Melinda Gates Foundation and the Wellcome Trust helped extend the EuPathDB project to cover parasites responsible for Leishmania, African sleeping sickness and Chagas disease.
“Recent years have witnessed a dramatic increase in research and drug discovery for these organisms, and we are glad that EuPathDB has helped to move this work forward,” Roos said.
Using EuPathDB and other resources, researchers around the world can now conduct cutting-edge research “in silico,”on the computer, maximizing the chance of success when translated to the lab or clinic.