Author manuscript collection

The Europe PMC Author Manuscript Collection consists of articles in author manuscript form that have been made available in Europe PMC and PubMed Central (PMC) in compliance with Europe PMC funder policies and the public access policies of NIH and other funders that participate in PMC. The text of manuscripts in the Collection may be downloaded in XML and plain text formats.

Copyright

These files are available for text mining. They may also be used consistent with the principles of applicable copyright law.

Download the author manuscript collection

The files can be accessed using Europe PMC's FTP service.

The files have been packaged based on PMCID. This means that an author manuscript XML that has a PMCID of PMC3947720 would be packaged in the file author_manuscript_xml.PMC003xxxxxx.baseline.2022-12-16.tar.gz. As of December 2022, all author manuscripts have PMCIDs that fall in the range of PMC001xxxxxx to PMC009xxxxxx. Note that these files are quite large (up to 4 GB).

The files that contain the XML of all of the articles are:

The plain text files containing the extracted full text are:

A set of three files will be available daily for both XML and TXT versions of the author manuscripts. Here is an example of the files for the TXT versions on 2022-12-17:

  1. author_manuscript_txt.incr.2022-12-17.filelist.csv containing all the manuscripts metadata inserted on 17th of December in CSV format
  2. author_manuscript_txt.incr.2022-12-17.filelist.txt containing all the manuscripts metadata inserted on 17th of December in TXT format
  3. author_manuscript_txt.incr.2022-12-17.tar.gz containing the content in text format of all the manuscripts inserted on 17th of December

An equivalent set of files are available for the corresponding XML versions of author manuscripts.