CAMI

Participate by downloading datasets and submitting your computed assembly, binning or profiling.

Note!

You can speed up downloads by using our camiClient.jar [read more]
Please use our provided databases below (database tab)

Datasets

2nd CAMI Challenge Rhizosphere challenge

Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/plant_associated/

https://doi.org/10.4126/FRL01-006425521

Rhizosphere challenge:
Data set description: Simulated short read and 
 ...

Competition End

Sun Nov 29 23:00:00 UTC 2020

Additional Resources

Note!

You need to login / register to see competition datasets

2nd CAMI Challenge: Clinical pathogen detection challenge

Description [+]

Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/patmgCAMI2.tar.gz

Case description: 
A 32-year-old woman presented to an Emergency Center on March 22nd 2018 because of vomiting,
 ...

Competition End

Thu Oct 17 22:00:00 UTC 2019

Additional Resources

2nd CAMI Challenge Pathogen Dataset Raw Data

Note!

You need to login / register to see competition datasets

2nd CAMI Challenge Marine Dataset

Description [+]

Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/marine/

https://doi.org/10.4126/FRL01-006425521

Data set description: Simulated short read and long read shotgun metagenome data
 ...

Competition End

Thu Oct 17 22:00:00 UTC 2019

Additional Resources

Note!

You need to login / register to see competition datasets

2nd CAMI Challenge Strain Madness Dataset

Description [+]

Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/strain/

https://doi.org/10.4126/FRL01-006425521

Data set description: Simulated long read and short read shotgun metagenome data
 ...

Competition End

Thu Oct 17 22:00:00 UTC 2019

Additional Resources

Note!

You need to login / register to see competition datasets

2nd CAMI Toy Human Microbiome Project Dataset

Description [+]

Data can be downloaded from: https://frl.publisso.de/data/frl:6425518/

https://doi.org/10.4126/FRL01-006425518

Dataset description: Simulated metagenome data from five different body sites of the human host, namely gastrointestinal tract, oral cavity, airways, skin and urogenital tract.
Underlying genome sources: NCBI RefSeq complete genomes, 07.08.2017
Underlying microbiome profile source: HMP
Taxonomy used: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_DATABASES/taxdump_cami2_toy.tar.gz

Sample descriptions: Simulated Illumina HiSeq metagenome data
Number of samples: 49 (10 GI tract, 10 oral cavity, 10 airways, 10 skin, 9 urogenital tract)
Total size: 245 Gbp
Read length: 2x150 bp
Insert size mean: 270 bp
Insert size s.d.: 20 bp


Sample descriptions: Simulated Pacific Bioscience metagenome data
Number of samples: 49 (10 GI tract, 10 oral cavity, 10 airways, 10 skin, 9 urogenital tract)
Total size: 245 Gbp
Average read length: 3,000 bp
Read length s.d.: 1,000 bp

You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar

Choose one of the following URLs:

Airways: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Airways
Gastrointestinal tract: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Gastrointestinal_tract
Oral cavity: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Oral
Skin: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Skin
Urogenital tract: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Urogenital_tract

List all files:
> java -jar camiClient.jar -l URL

Download all files:
> java -jar camiClient.jar -d URL . -p .

If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve.
Use the listing (see above) to pick what you want to download.
> java -jar camiClient.jar -d URL . -p PATTERN

Download just fastq files:
> java -jar camiClient.jar -d URL . -p fq.gz

Download just the gold standard assembly including the mapping of contigs to reference genomes:
> java -jar camiClient.jar -d URL . -p gsa

Download just sample_1 files from skin samples:
> java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Skin . -p sample_1

Folder structure:

Sample folders start with the date of creation and end with the sample number:
yyyy.mm.dd_hh.mm.ss_sample_#

In every sample folder there are three subfolders, bam, contigs and reads:
The bam folder contains the mapping of all the created reads to the input genomes:

Inside this folder is a bam file for every genome for which at least one read was produced, which is uniquely indicated by a combination of OTU and a running ID counter for the number of genomes included in that OTU in the sample:
OTU_ID.bam
The contigs folder contains the gold standard assembly for that particular sample

It contains two files, the gold standard in fasta format:
anonymous_gsa.fasta.gz
And the mapping for each contigs to its genome/taxon id and position in this genome:
gsa_mapping.tsv.gz
The reads folder contains the created reads for that sample:

It contains two files, one with the fq reads themselves, containing both ends for paired end sequencing and with anonymised names:
anonymous_reads.fq.gz
And the second one is a mapping of every single read to the genome it originated from and the original read ID (pre anonymisation)
reads_mapping.tsv.gz

every data set contains one abundance file per sample mapping OTUs to genomes:
abundance#.tsv

every data set contains the pooled gold standard assembly over all samples in the folder
anonymous_gsa_pooled.fasta.gz

config file used for creating the data set at hand (can be used as input to CAMISIM for re-creating the data set)
config.ini

mapping from the original (BIOM) OTU name to the genome fasta file
genome_to_id.tsv

genomes folder containing all the reference genomes used over all samples (using the mapping from genome_to_id.tsv)
genomes
This folder contains all the fasta files of the downloaded genomes:
genome_name.fa

Since the contigs are anonymized, a file mapping each contig to its genome/taxon id and position in the respective genome is provided
gsa_pooled_mapping.tsv.gz

To each input OTU (from the BIOM file), two tax IDs are mapped: One of the level on which the OTU was mapped to the NCBI and one to the specifically downloaded genome, contains a novelty_category column in case new genomes are provided, otherwise this column is "new_strain" and can be ignored
metadata.tsv

In the top folder “hybrid” there are assembly and binning gold standards created from both the short and long read data sets. For every sample as well as all samples pooled, the bam-files of short and long read simulators (as described for the “bam” subfolder above) are merged and the gold standards calculated the same way as for the individual short or long read samples.

Data can be downloaded from: https://frl.publisso.de/data/frl:6425518/

https://doi.org/10.4126/FRL01-006425518

Dataset description: Simulated metagenome data from five different body sites of the hu
 ...

Competition End

Wed Dec 30 23:00:00 UTC 2020

2nd CAMI Toy Mouse Gut Dataset

Description [+]

Data can be downloaded from: https://frl.publisso.de/data/frl:6421672/

https://doi.org/10.4126/FRL01-006421672

Data set description: Simulated metagenome data from the guts of different mice, vendors and positions in the gut
Underlying genome sources: NCBI RefSeq scaffolds, 18.1.2018
Underlying microbiome profile source: still unreleased
Taxonomy used: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_DATABASES/taxdump_cami2_toy.tar.gz

Sample descriptions: Simulated Illumina HiSeq metagenome data
Number of samples: 64 (12 different mice microbiota)
Total size: 320 Gbp
Read length: 2x150 bp
Insert size mean: 270 bp
Insert size s.d.: 20 bp

Sample descriptions: Simulated Pacific Bioscience metagenome data
Number of samples: 64 (12 different mice microbiota)
Total size: 320 Gbp
Average read length: 3,000 bp
Read length s.d.: 1,000 bp


You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar

List all files:
> java -jar camiClient.jar -l https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMISIM_MOUSEGUT

If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve.
Use the listing (see above) to pick what you want to download.
> java -jar camiClient.jar -d URL . -p PATTERN

Download all files:
> java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMISIM_MOUSEGUT . -p .

Download just fastq files:
> java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMISIM_MOUSEGUT . -p fq.gz

Download just the gold standard assembly including the mapping of contigs to reference genomes:
> java -jar camiClient.jar -d URL . -p gsa

Download just sample_0 files:
> java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMISIM_MOUSEGUT . -p sample_0

Folder structure:

Metadata in human-readable/text format
sample folders start with the date of creation and end with the sample number:
yyyy.mm.dd_hh.mm.ss_sample_#

In every sample folder there are three subfolders, bam, contigs and reads:
The bam folder contains the mapping of all the created reads to the input genomes:

Inside this folder is a bam file for every genome for which at least one read was produced, which is uniquely indicated by a combination of OTU and a running ID counter for the number of genomes included in that OTU in the sample:
OTU_ID.bam
The contigs folder contains the gold standard assembly for that particular sample

It contains two files, the gold standard in fasta format:
anonymous_gsa.fasta.gz
And the mapping for each contigs to its genome/taxon id and position in this genome:
gsa_mapping.tsv.gz
The reads folder contains the created reads for that sample:

It contains two files, one with the fq reads themselves, containing both ends for paired end sequencing and with anonymised names:
anonymous_reads.fq.gz
And the second one is a mapping of every single read to the genome it originated from and the original read ID (pre anonymisation)
reads_mapping.tsv.gz

every data set contains one abundance file per sample mapping OTUs to genomes:
abundance#.tsv

every data set contains the pooled gold standard assembly over all samples in the folder
anonymous_gsa_pooled.fasta.gz

config file used for creating the data set at hand (can be used as input to CAMISIM for re-creating the data set)
config.ini

mapping from the original (BIOM) OTU name to the genome fasta file
genome_to_id.tsv

genomes folder containing all the reference genomes used over all samples (using the mapping from genome_to_id.tsv)
genomes
This folder contains all the fasta files of the downloaded genomes:
genome_name.fa

Since the contigs are anonymized, a file mapping each contig to its genome/taxon id and position in the respective genome is provided
gsa_pooled_mapping.tsv.gz

To each input OTU (from the BIOM file), two tax IDs are mapped: One of the level on which the OTU was mapped to the NCBI and one to the specifically downloaded genome, contains a novelty_category column in case new genomes are provided, otherwise this column is "new_strain" and can be ignored
metadata.tsv

In the folder “hybrid” there are assembly and binning gold standards created from both the short and long read data sets. For every sample as well as all samples pooled, the bam-files of short and long read simulators (as described for the “bam” subfolder above) are merged and the gold standards calculated the same way as for the individual short or long read samples.

Data can be downloaded from: https://frl.publisso.de/data/frl:6421672/

https://doi.org/10.4126/FRL01-006421672

Data set description: Simulated metagenome data from the guts of different mice, vendor
 ...

Competition End

Wed Dec 30 23:00:00 UTC 2020

1st CAMI Challenge Dataset 1 CAMI_low

Description [+]

Data can be downloaded from https://doi.org/10.5524/100344

Previous download location: http://gigadb.org/dataset/100344

Low complexity data set for the 1st CAMI challenge:

simulated Illumina HiSeq
 ...

Competition End

Fri Jul 17 22:00:00 UTC 2015

Samples

Low

Note!

You need to login / register to see competition datasets

1st CAMI Challenge Dataset 2 CAMI_medium

Description [+]

Data can be downloaded from https://doi.org/10.5524/100344

Previous download location: http://gigadb.org/dataset/100344

Medium complexity data set for the 1st CAMI challenge:

medium complexity com
 ...

Competition End

Fri Jul 17 22:00:00 UTC 2015

CAMI_medium.tar (Public Access)

Download 26.5 gb

Samples

Sample 1

Sample 2

Additional Resources

CAMI 1 Data (Public Access)

Download

Note!

You need to login / register to see competition datasets

1st CAMI Challenge Dataset 3 CAMI_high

Description [+]

Data can be downloaded from https://doi.org/10.5524/100344

Previous download location: http://gigadb.org/dataset/100344

High complexity data set for the 1st CAMI challenge:

Time series with 5 Hise
 ...

Competition End

Fri Jul 17 22:00:00 UTC 2015

Samples

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Additional Resources

Note!

You need to login / register to see competition datasets

1. Toy Test Dataset Low_Complexity

Description [+]

Data can be downloaded from https://doi.org/10.5524/100344

Previous download location: http://gigadb.org/dataset/100344

This is a toy data set simulated from public genomes. You can use this for te
 ...

Additional Resources

2. Toy Test Dataset Medium_Complexity

Description [+]

Data can be downloaded from https://doi.org/10.5524/100344

Previous download location: http://gigadb.org/dataset/100344

This is a toy data set simulated from public genomes. You can use this for te
 ...

Samples

MC_Sample1_180bp

MC_Sample1_5kb

MC_Sample2_180bp

MC_Sample2_5kb

Additional Resources

3. Toy Test Dataset High_Complexity

Description [+]

Data can be downloaded from https://doi.org/10.5524/100344

Previous download location: http://gigadb.org/dataset/100344

This is a toy data set simulated from public genomes. You can use this for te
 ...

Samples

HC_Sample1

HC_Sample2

HC_Sample3

HC_Sample4

HC_Sample5

Additional Resources

Databases

CAMI 1 Challenge NCBI Taxonomy database

NCBI Taxonomy database as of 2015/06/22 to be used for CAMI 1 challenge datasets

Download

CAMI 1 NCBI Refseq and Taxonomy Database as of 2015/06/22

This tar-ball is a copy of the NCBI Refseq and Taxonomy Database as of 2015/06/22. This database should be used as a basis for reference based binning and profiling tools for the CAMI 1 challenge datasets.

Download

CAMI 1 Taxonomy database for camiClient

Taxonomy database to be used for cami upload client

Download

CAMI 2 Challenge Accession to Taxid Mapping

NCBI accession to taxid mapping as of 2019/01/08 to be used for CAMI 2 challenge datasets

Download

CAMI 2 Challenge Blast nr

NCBI nr database as of 2019/01/08 to be used for CAMI 2 challenge datasets

Download

CAMI 2 Challenge Blast nt

NCBI nt database as of 2019/01/08 to be used for CAMI 2 challenge datasets