Each file must be UTF-8 encoded.
Each row in the annotations file will contain one JSON object with all
the annotations for a specific article:
{ "src":"MED", "id": "27105176", "provider": "europepmc", "anns": [ { "position": "1.2", "prefix": "Noninvasive Markers to Assess ", "exact": "Liver Fibrosis", "section": "Title", "postfix": ". ", "tags": [ { "name": "Liver Fibrosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" } ] }, {"position": "2.1", "prefix": "", "exact": "Chronic liver disease", "section": "Abstract", "postfix": " represents a major public health proble", "tags": [ { "name": "Chronic liver disease", "uri": "http://linkedlifedata.com/resource/umls-concept/C0341439" } ] }, {"position": "3.2", "prefix": " and progression of ", "exact": "liver fibrosis", "section": "Abstract", "postfix": " with time and the r", "tags": [ { "name": "liver fibrosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" } ] }, {"position": "3.4", "prefix": "h time and the risk of development of ", "exact": "cirrhosis", "section": "Abstract", "postfix": ". ", "tags": [ { "name": "cirrhosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0023890" } ] }, {"position": "7.2", "prefix": "essing the presence and the degree of ", "exact": "liver fibrosis", "section": "Abstract", "postfix": ". ", "tags": [ { "name": "liver fibrosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" } ] }, {"position": "8.2", "prefix": "e methods useful in the evaluation of ", "exact": "liver fibrosis", "section": "Abstract", "postfix": ". ", "tags": [ { "name": "liver fibrosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" } ] } ] }
There are two types of the annotations in the platform: sentence-based
annotations and named entity annotations.
The JSON schema information that each object should adhere to varies
according to the annotation type. More details as well as data
validation guidelines before submission can be found
here.
An example of sentence-based annotations for one article:
{
"src": "PMC" , #source of the article
"id":"PMC5844054", #identifier of the article in the context of the source field
"provider":"Disgenet" , #name of the provider
"anns": [
{
"exact": ".... loss of SBP1 may play... aggressive disease.", # annotation sentence
"section": "abstract", #section of the article where the annotation was found.
"tags": [{ #tagged entities
"name": "... loss of SBP1 may play... aggressive disease.", #identifying name of the tagged entity
"uri": "http://purl.uniprot.org/uniprot/Q13228" #specific URI of the tagged entity
}]
},....#other annotations elements go here
]
}
An example of named entity annotations for one article:
{
"src": "MED", #source of the article
"id": "27105176", #identifier of the article in the context of the source field
"provider": "europepmc", #name of the provider
"anns": [
{
"position": "1.2", #position of the entity in the article
"prefix": "Noninvasive Markers to Assess ", #prefix of the entity inside the sentence of the article
"postfix": ". ", #postfix of the entity inside the sentence of the article
"exact": "Liver Fibrosis", # entity referred by the annotation
"section": "Title", #section of the article where the annotation was found.
"tags": [{ #tagged entities
"name": "Liver Fibrosis", #identifying name of the tagged entity
"uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" #specific URI of the tagged entity
}]
},....#other annotations elements go here
]
}
Here is the list of fields with the relative explanations:
Name | Meaning | Notes |
---|
src | Source of the article |
Mandatory field. It has to be one of the following values:
- MED: PubMed MEDLINE abstract
- PMC: PubMedCentral full text article
- PAT: Patents
- AGR: Agricola (USDA/NAL)
- CBA: Chinese biological abstracts
- HIR: NHS Evidence (UK HIR)
- CTX: CiteXplore submission
- ETH: EThOS theses (BL)
- CIT: CiteSeer (PSU)
|
id |
Identifier of the article in the context of the src field
provided
| Mandatory field |
provider | Name of the provider |
Mandatory field. It must match the identifying name assigned
to the provider in the subscription phase
|
anns | List of annotations | Mandatory field |
anns.position | Position of the annotation |
Mandatory field only for named entity annotations. We require
the relative order of mined entities within an article. This
information is used to help to locate and highlight mined
entities in the text. E.g., "1.3" means that the entity was
found in the third chunk of the first sentence of the article
|
anns.prefix |
Portion of the sentence that appears before the tagged entity
|
Relevant only for named entity recognition annotations. For
each annotation at least one field between prefix and postfix
must be specified
|
anns.postfix |
Portion of the sentence that appears after the tagged entity
|
Relevant only for named entity recognition annotations. For
each annotation at least one field between prefix and postfix
must be specified
|
anns.exact | Text of the tagged entity | Mandatory field |
anns.section |
Name of the section of the article where the tagged entity
appears
|
Optional field. For annotations datasets corresponding to a
full text article (src=PMC) the list of possible values is:
- Title
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Acknowledgments
- References
- Table
- Figure
- Case study
- Supplementary material
- Conclusion
- Abbreviations
- Competing Interests
-
Article (default to use if the annotation can not be
mapped to any of the other sections)
For any other article source the possible values are:
|
anns.tags | List of the entities tagged by this annotation |
Mandatory field. This list should contain at least one tag
|
anns.tags.name | Name of the tagged entity | Mandatory field |
anns.tags.uri |
URI to the ID or the Accession number the entity is linked to
(e.g.: UniPort: http://purl.uniprot.org/uniprot/[Acs_number]).
| Mandatory field |