The Europe PMC Author Manuscript Collection consists of articles in author manuscript form that have been made available in Europe PMC and PubMed Central (PMC) in compliance with Europe PMC funder policies and the public access policies of NIH and other funders that participate in PMC. The text of manuscripts in the Collection may be downloaded in XML and plain text formats.
These files are available for text mining. They may also be used consistent with the principles of applicable copyright law.
The files can be accessed using Europe PMC's FTP service.
The files have been packaged based on PMCID. This means that an author manuscript XML that has a PMCID of PMC3947720 would be packaged in the file author_manuscript_xml.PMC003xxxxxx.baseline.2022-12-16.tar.gz. As of December 2022, all author manuscripts have PMCIDs that fall in the range of PMC001xxxxxx to PMC009xxxxxx. Note that these files are quite large (up to 4 GB).
The files that contain the XML of all of the articles are:
The plain text files containing the extracted full text are:
A set of three files will be available daily for both XML and TXT versions of the author manuscripts. Here is an example of the files for the TXT versions on 2022-12-17:
An equivalent set of files are available for the corresponding XML versions of author manuscripts.