The United States government has amassed incredible amounts of data. Much of it is freely available to the public through file downloads or APIs. For example, the National Center for Biotechnology Information (NCBI) within the U.S. National Institutes of Health (NIH) manages 43 publicly accessible biomedical databases collectively called Entrez databases.
Entrez databases contain diverse data, such as references and abstracts for over 30 million biomedical journal articles, genome sequences, and classification and nomenclature for all organisms in the public sequence databases.
Data engineers, data analysts, data scientists, and software developers can leverage the diverse biomedical and biotechnology data stored in Entrez databases for their projects. For example, I have developed several solutions to query, retrieve, and analyze data from the Entrez PubMed biomedical article abstract database.
This article describes Entrez programming utilities (E-utilities) you can use to access Entrez databases. It also demonstrates the Python c_e_info class to query metadata about the databases. It calls the Entrez EInfo utility to obtain the list of Entrez databases and metadata for any of its 43 databases. You can use other E-utilities to search the databases and retrieve biomedical data for your projects.
#data-science #python #database #data-analysis #data #query metadata for 43 nih biomedical databases with this python class