Participate by downloading datasets and submitting your computed assembly, binning or profiling.
Note!
You can speed up downloads by using our camiClient.jar [read more]
Please use our provided databases below (database tab)
Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/plant_associated/ https://doi.org/10.4126/FRL01-006425521 Rhizosphere challenge: Data set description: Simulated short read and long read shotgun metagenome data from samples taken from a plant rhizosphere environment Underlying genome sources: Undisclosed Underlying microbiome profile source: Undisclosed Taxonomy used: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_DATABASES/taxdump_cami2_toy.tar.gz Sample descriptions: Simulated Illumina HiSeq metagenome data Number of samples: 21 Total size: 105 Gb Read length: 2x150 bp Insert size mean: 270 bp Insert size s.d.: 20 bp Sample descriptions: Simulated Pacific Bioscience metagenome data Number of samples: 21 Total size: 105 Gb Average read length: 3,000 bp Read length s.d.: 1,000 bp Sample descriptions: Simulated Oxford Nanopore metagenome data Number of samples: 21 Total size: 105 Gb Average read length: 1,610 bp Read length s.d.: ~3,000 bp
Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/plant_associated/ https://doi.org/10.4126/FRL01-006425521 Rhizosphere challenge: Data set description: Simulated short read and ...
Note!
You need to login / register to see competition datasets
Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/patmgCAMI2.tar.gz Case description: A 32-year-old woman presented to an Emergency Center on March 22nd 2018 because of vomiting, abdominal pain and strong nosebleeds. She claimed to feel well until 5 days prior to admission when she began to develop fever, joint pain and muscle pain. Four days before admission she presented to her general practitioner and was diagnosed with influenza-like illness. One day before admission, her state rapidly deteriorated with onset of intense abdominal pain, followed by vomiting and nosebleeds, prompting her to present to the Emergency Center. She was never hospitalized for any medical illness. She denied any recent trauma. Four days prior to onset of symptoms, she had returned from a one-month hiking trip between Fethiye and Antalya in Turkey. She denied any unusual contact with wildlife or eating raw meats during her trip. The hospital has sent you a nasal swab for sequencing in order to identify the causative agent. You have generated a paired-end MiSeq sequence sample from this for further analysis. The results of classical molecular tests are still pending. Expected submissions: A list of NCBI taxonomy IDs (plain text file called taxa.txt, a single line containing a tab-separated list of taxonomy IDs) of pathogens found in the sample and a single taxonomy ID (plain text file called pathogen.txt, containing only a single taxonomy ID) of the pathogen responsible for the symptoms. For selecting the appropriate taxonomy IDs use the taxonomy database for the CAMI 2 challenge (also listed on the Databases tab) available from: NCBI Taxonomy: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_taxonomy.tar Please fill in and submit the following form in order to get access to the CAMI 2 datasets.
Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/patmgCAMI2.tar.gz Case description: A 32-year-old woman presented to an Emergency Center on March 22nd 2018 because of vomiting, ...
Note!
You need to login / register to see competition datasets
Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/marine/ https://doi.org/10.4126/FRL01-006425521 Data set description: Simulated short read and long read shotgun metagenome data from samples at different seafloor locations of a marine environment Underlying genome sources: Undisclosed Underlying microbiome profile source: Undisclosed Sample descriptions: Simulated Illumina HiSeq metagenome data Number of samples: 10 Total size: 50 Gb Read length: 2x150 bp Insert size mean: 270 bp Insert size s.d.: 20 bp Sample descriptions: Simulated Pacific Bioscience metagenome data Number of samples: 10 Total size: 50 Gb Average read length: 3,000 bp Read length s.d.: 1,000 bp Assemblies for Binning and Profiling Challenge The assemblies for the binning and profiling challenge are available in the download section. We provide two files: Gold Standard Assembly: The gold standard assembly includes all regions from reference genomes which have at least 1x coverage after short read simulation. Samples have been pooled for this gold standard. File: marmgCAMI2_short_read_pooled_gold_standard_assembly.fasta.gz Megahit Assembly: Megahit assembly generated from pooled samples. File: marmgCAMI2_short_read_pooled_megahit_assembly.fasta.gz Please use the following CAMI 2 challenge databases for the profiling and taxonomic binning challenge (also listed on the Databases tab): Blast nr: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_blast/nr.gz Blast nt: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_blast/nt.gz NCBI Taxonomy: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_taxonomy.tar Accession to Taxid Mapping: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_taxonomy_accession2taxid.tar Please fill in and submit the following form in order to get access to the CAMI 2 datasets.
Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/marine/ https://doi.org/10.4126/FRL01-006425521 Data set description: Simulated short read and long read shotgun metagenome data ...
Note!
You need to login / register to see competition datasets
Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/strain/ https://doi.org/10.4126/FRL01-006425521 Data set description: Simulated long read and short read shotgun metagenome data, including a large amount of strain-level variation Underlying genome sources: Undisclosed Underlying microbiome profile source: Undisclosed Sample descriptions: Simulated Illumina HiSeq metagenome data Number of samples: 100 Total size: 200 Gb Read length: 2x150 bp Insert size mean: 270 bp Insert size s.d.: 20 bp Sample descriptions: Simulated Pacific Bioscience metagenome data Number of samples: 100 Total size: 200 Gb Average read length: 3,000 bp Read length s.d.: 1,000 bp Assemblies for Binning and Profiling Challenge The assemblies for the binning and profiling challenge are available in the download section. We provide two files: Gold Standard Assembly: The gold standard assembly includes all regions from reference genomes which have at least 1x coverage after short read simulation. Samples have been pooled for this gold standard. File: strmgCAMI2_short_read_pooled_gold_standard_assembly.fasta.gz Megahit Assembly: Megahit assembly generated from pooled samples. File: strmgCAMI2_short_read_pooled_megahit_assembly.fasta.gz Please use the following CAMI 2 challenge databases for the profiling and taxonomic binning challenge (also listed on the Databases tab): Blast nr: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_blast/nr.gz Blast nt: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_blast/nt.gz NCBI Taxonomy: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_taxonomy.tar Accession to Taxid Mapping: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_2_DATABASES/ncbi_taxonomy_accession2taxid.tar Please fill in and submit the following form in order to get access to the CAMI 2 datasets.
Data can be downloaded from: https://frl.publisso.de/data/frl:6425521/strain/ https://doi.org/10.4126/FRL01-006425521 Data set description: Simulated long read and short read shotgun metagenome data ...
Note!
You need to login / register to see competition datasets
Data can be downloaded from: https://frl.publisso.de/data/frl:6425518/ https://doi.org/10.4126/FRL01-006425518 Dataset description: Simulated metagenome data from five different body sites of the human host, namely gastrointestinal tract, oral cavity, airways, skin and urogenital tract. Underlying genome sources: NCBI RefSeq complete genomes, 07.08.2017 Underlying microbiome profile source: HMP Taxonomy used: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_DATABASES/taxdump_cami2_toy.tar.gz Sample descriptions: Simulated Illumina HiSeq metagenome data Number of samples: 49 (10 GI tract, 10 oral cavity, 10 airways, 10 skin, 9 urogenital tract) Total size: 245 Gbp Read length: 2x150 bp Insert size mean: 270 bp Insert size s.d.: 20 bp Sample descriptions: Simulated Pacific Bioscience metagenome data Number of samples: 49 (10 GI tract, 10 oral cavity, 10 airways, 10 skin, 9 urogenital tract) Total size: 245 Gbp Average read length: 3,000 bp Read length s.d.: 1,000 bp You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar Choose one of the following URLs: Airways: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Airways Gastrointestinal tract: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Gastrointestinal_tract Oral cavity: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Oral Skin: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Skin Urogenital tract: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Urogenital_tract List all files: > java -jar camiClient.jar -l URL Download all files: > java -jar camiClient.jar -d URL . -p . If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve. Use the listing (see above) to pick what you want to download. > java -jar camiClient.jar -d URL . -p PATTERN Download just fastq files: > java -jar camiClient.jar -d URL . -p fq.gz Download just the gold standard assembly including the mapping of contigs to reference genomes: > java -jar camiClient.jar -d URL . -p gsa Download just sample_1 files from skin samples: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_Skin . -p sample_1 Folder structure: Sample folders start with the date of creation and end with the sample number: yyyy.mm.dd_hh.mm.ss_sample_# In every sample folder there are three subfolders, bam, contigs and reads: The bam folder contains the mapping of all the created reads to the input genomes: Inside this folder is a bam file for every genome for which at least one read was produced, which is uniquely indicated by a combination of OTU and a running ID counter for the number of genomes included in that OTU in the sample: OTU_ID.bam The contigs folder contains the gold standard assembly for that particular sample It contains two files, the gold standard in fasta format: anonymous_gsa.fasta.gz And the mapping for each contigs to its genome/taxon id and position in this genome: gsa_mapping.tsv.gz The reads folder contains the created reads for that sample: It contains two files, one with the fq reads themselves, containing both ends for paired end sequencing and with anonymised names: anonymous_reads.fq.gz And the second one is a mapping of every single read to the genome it originated from and the original read ID (pre anonymisation) reads_mapping.tsv.gz every data set contains one abundance file per sample mapping OTUs to genomes: abundance#.tsv every data set contains the pooled gold standard assembly over all samples in the folder anonymous_gsa_pooled.fasta.gz config file used for creating the data set at hand (can be used as input to CAMISIM for re-creating the data set) config.ini mapping from the original (BIOM) OTU name to the genome fasta file genome_to_id.tsv genomes folder containing all the reference genomes used over all samples (using the mapping from genome_to_id.tsv) genomes This folder contains all the fasta files of the downloaded genomes: genome_name.fa Since the contigs are anonymized, a file mapping each contig to its genome/taxon id and position in the respective genome is provided gsa_pooled_mapping.tsv.gz To each input OTU (from the BIOM file), two tax IDs are mapped: One of the level on which the OTU was mapped to the NCBI and one to the specifically downloaded genome, contains a novelty_category column in case new genomes are provided, otherwise this column is "new_strain" and can be ignored metadata.tsv In the top folder “hybrid” there are assembly and binning gold standards created from both the short and long read data sets. For every sample as well as all samples pooled, the bam-files of short and long read simulators (as described for the “bam” subfolder above) are merged and the gold standards calculated the same way as for the individual short or long read samples.
Data can be downloaded from: https://frl.publisso.de/data/frl:6425518/ https://doi.org/10.4126/FRL01-006425518 Dataset description: Simulated metagenome data from five different body sites of the hu ...
Data can be downloaded from: https://frl.publisso.de/data/frl:6421672/ https://doi.org/10.4126/FRL01-006421672 Data set description: Simulated metagenome data from the guts of different mice, vendors and positions in the gut Underlying genome sources: NCBI RefSeq scaffolds, 18.1.2018 Underlying microbiome profile source: still unreleased Taxonomy used: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_DATABASES/taxdump_cami2_toy.tar.gz Sample descriptions: Simulated Illumina HiSeq metagenome data Number of samples: 64 (12 different mice microbiota) Total size: 320 Gbp Read length: 2x150 bp Insert size mean: 270 bp Insert size s.d.: 20 bp Sample descriptions: Simulated Pacific Bioscience metagenome data Number of samples: 64 (12 different mice microbiota) Total size: 320 Gbp Average read length: 3,000 bp Read length s.d.: 1,000 bp You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar List all files: > java -jar camiClient.jar -l https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMISIM_MOUSEGUT If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve. Use the listing (see above) to pick what you want to download. > java -jar camiClient.jar -d URL . -p PATTERN Download all files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMISIM_MOUSEGUT . -p . Download just fastq files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMISIM_MOUSEGUT . -p fq.gz Download just the gold standard assembly including the mapping of contigs to reference genomes: > java -jar camiClient.jar -d URL . -p gsa Download just sample_0 files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMISIM_MOUSEGUT . -p sample_0 Folder structure: Metadata in human-readable/text format sample folders start with the date of creation and end with the sample number: yyyy.mm.dd_hh.mm.ss_sample_# In every sample folder there are three subfolders, bam, contigs and reads: The bam folder contains the mapping of all the created reads to the input genomes: Inside this folder is a bam file for every genome for which at least one read was produced, which is uniquely indicated by a combination of OTU and a running ID counter for the number of genomes included in that OTU in the sample: OTU_ID.bam The contigs folder contains the gold standard assembly for that particular sample It contains two files, the gold standard in fasta format: anonymous_gsa.fasta.gz And the mapping for each contigs to its genome/taxon id and position in this genome: gsa_mapping.tsv.gz The reads folder contains the created reads for that sample: It contains two files, one with the fq reads themselves, containing both ends for paired end sequencing and with anonymised names: anonymous_reads.fq.gz And the second one is a mapping of every single read to the genome it originated from and the original read ID (pre anonymisation) reads_mapping.tsv.gz every data set contains one abundance file per sample mapping OTUs to genomes: abundance#.tsv every data set contains the pooled gold standard assembly over all samples in the folder anonymous_gsa_pooled.fasta.gz config file used for creating the data set at hand (can be used as input to CAMISIM for re-creating the data set) config.ini mapping from the original (BIOM) OTU name to the genome fasta file genome_to_id.tsv genomes folder containing all the reference genomes used over all samples (using the mapping from genome_to_id.tsv) genomes This folder contains all the fasta files of the downloaded genomes: genome_name.fa Since the contigs are anonymized, a file mapping each contig to its genome/taxon id and position in the respective genome is provided gsa_pooled_mapping.tsv.gz To each input OTU (from the BIOM file), two tax IDs are mapped: One of the level on which the OTU was mapped to the NCBI and one to the specifically downloaded genome, contains a novelty_category column in case new genomes are provided, otherwise this column is "new_strain" and can be ignored metadata.tsv In the folder “hybrid” there are assembly and binning gold standards created from both the short and long read data sets. For every sample as well as all samples pooled, the bam-files of short and long read simulators (as described for the “bam” subfolder above) are merged and the gold standards calculated the same way as for the individual short or long read samples.
Data can be downloaded from: https://frl.publisso.de/data/frl:6421672/ https://doi.org/10.4126/FRL01-006421672 Data set description: Simulated metagenome data from the guts of different mice, vendor ...
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 Low complexity data set for the 1st CAMI challenge: simulated Illumina HiSeq data, small insert size. Number of samples: 1 Total Size: 15 Gbp Read length: 2x150 bp Insert size mean: 270 bp Insert size stddev: 27 bp You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar List all files: > java -jar camiClient.jar -l https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_LOW If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve. Use the listing (see above) to pick what you want to download. > java -jar camiClient.jar -d URL . PATTERN Download all files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_LOW . . Download just fastq files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_LOW . fq.gz Download just the gold standard assembly of all samples: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_LOW . gold_standard_low_single.fasta.gz Download just the mapping of contigs to reference genomes: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_LOW . gsa_mapping.binning
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 Low complexity data set for the 1st CAMI challenge: simulated Illumina HiSeq ...
Note!
You need to login / register to see competition datasets
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 Medium complexity data set for the 1st CAMI challenge: medium complexity community, sampled twice, with differential abundances of respective organisms, and short and long insert sizes used for sequencing: 2 Hiseq samples from each with small insert sizes of 15 Gbp. 2 Hiseq samples with large insert sizes (5kb insert) of 5 Gbp. Number of samples: 2 Total Size: 40 Gbp Read length: 2x150 bp Insert size mean: 270 bp and 5 kbp Insert size stddev: 10% You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar List all files: > java -jar camiClient.jar -l https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_MEDIUM If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve. Use the listing (see above) to pick what you want to download. > java -jar camiClient.jar -d URL . PATTERN Download all files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_MEDIUM . . Download just fastq files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_MEDIUM . fq.gz Download just the gold standard assembly of all samples : > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_MEDIUM . gold_standard_medium_single.fasta.gz Download just the mapping of gold standard assembly contigs to reference genomes: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_MEDIUM . pooled_gsa_mapping.binning.tsv
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 Medium complexity data set for the 1st CAMI challenge: medium complexity com ...
Note!
You need to login / register to see competition datasets
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 High complexity data set for the 1st CAMI challenge: Time series with 5 Hiseq samples of 15 Gbp each with small insert sizes sampled from a complex microbial community. Number of samples: 5 Total Size: 75 Gbp Read length: 2x150 bp Insert size mean: 270 bp Insert size stddev: 10% You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar List all files: > java -jar camiClient.jar -l https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_HIGH If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve. Use the listing (see above) to pick what you want to download. > java -jar camiClient.jar -d URL . PATTERN Download all files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_HIGH . . Download just fastq files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_HIGH . fq.gz Download just the gold standard assembly of all samples : > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_HIGH . CAMI_high_GoldStandardAssembly.fasta.gz Download just the mapping of contigs to reference genomes: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_HIGH . gsa_mapping_pool.binning
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 High complexity data set for the 1st CAMI challenge: Time series with 5 Hise ...
Note!
You need to login / register to see competition datasets
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 This is a toy data set simulated from public genomes. You can use this for testing your tools (gold standards provided). THIS IS NOT A CHALLENGE DATA SET. Genomes: 30, Total Size: 15 Gbp, Read length: 2x100 bp, Insert size mean: 180 bp, Insert size stddev: 10%, You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar List all files: > java -jar camiClient.jar -l https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_LOW If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve. Use the listing (see above) to pick what you want to download. > java -jar camiClient.jar -d URL . PATTERN Download all files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_LOW . . Download just fastq files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_LOW . fq.gz Download just the gold standard assembly: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_LOW . S_S001__genomes_30__insert_180_gsa_anonymous.fasta.gz Download just the mapping of gold standard assembly contigs to reference genomes: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_LOW . S_S001__genomes_30__insert_180_gsa_mapping.tsv.gz
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 This is a toy data set simulated from public genomes. You can use this for te ...
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 This is a toy data set simulated from public genomes. You can use this for testing your tools (gold standards provided). THIS IS NOT A CHALLENGE DATA SET. Two samples, differential abundance 2 Hiseq (small insert size) differential abundance 15 Gbp samples from 225 genomes. From the same two differential abundance community profiles, 2 Hiseq (5kb insert size) 0.75 Gbp samples. You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar List all files: > java -jar camiClient.jar -l https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_MEDIUM If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve. Use the listing (see above) to pick what you want to download. > java -jar camiClient.jar -d URL . PATTERN Download all files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_MEDIUM . . Download just fastq files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_MEDIUM . fq.gz Download just the gold standard assembly of all samples: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_MEDIUM . M1_M2_pooled_gsa_anonymous.fasta.gz Download just the mapping of gold standard assembly contigs to reference genomes: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_MEDIUM . M1_M2_pooled_gsa_mapping.tsv.gz
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 This is a toy data set simulated from public genomes. You can use this for te ...
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 This is a toy data set simulated from public genomes. You can use this for testing your tools (gold standards provided). THIS IS NOT A CHALLENGE DATA SET. 5 Hiseq (small insert size) 15 Gbp samples (time series) from 450 genomes 15 Giga base pairs (each sample) Insert size mean: 180 bp Insert size stddev: 18 bp Read length: 2x100 bp You can download the data using the cami client java tool https://data.cami-challenge.org/camiClient.jar List all files: > java -jar camiClient.jar -l https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_LOW If you only want to download a subset of the data, use a PATTERN which matches the files you want to retrieve. Use the listing (see above) to pick what you want to download. > java -jar camiClient.jar -d URL . PATTERN Download all files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_HIGH . . Download just fastq files: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_HIGH . fq.gz Download just the gold standard assembly of all samples: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_HIGH . H_pooled_gsa_anonymous.fasta.gz Download just the mapping of gold standard assembly contigs to reference genomes: > java -jar camiClient.jar -d https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_TOY_HIGH . H_pooled_gsa_mapping.tsv.gz
Data can be downloaded from https://doi.org/10.5524/100344 Previous download location: http://gigadb.org/dataset/100344 This is a toy data set simulated from public genomes. You can use this for te ...
CAMI 1 Challenge NCBI Taxonomy database
NCBI Taxonomy database as of 2015/06/22 to be used for CAMI 1 challenge datasets
CAMI 1 NCBI Refseq and Taxonomy Database as of 2015/06/22
This tar-ball is a copy of the NCBI Refseq and Taxonomy Database as of 2015/06/22. This database should be used as a basis for reference based binning and profiling tools for the CAMI 1 challenge datasets.
CAMI 1 Taxonomy database for camiClient
Taxonomy database to be used for cami upload client
CAMI 2 Challenge Accession to Taxid Mapping
NCBI accession to taxid mapping as of 2019/01/08 to be used for CAMI 2 challenge datasets
CAMI 2 Challenge Blast nr
NCBI nr database as of 2019/01/08 to be used for CAMI 2 challenge datasets
CAMI 2 Challenge Blast nt
NCBI nt database as of 2019/01/08 to be used for CAMI 2 challenge datasets
CAMI 2 Challenge NCBI RefSeq database
NCBI RefSeq database as of 2019/01/08 to be used for CAMI 2 challenge datasets
CAMI 2 Challenge NCBI Taxonomy
NCBI Taxonomy database as of 2019/01/08 to be used for CAMI 2 challenge datasets
CAMI 2 Taxonomy database for camiClient
Taxonomy database to be used for cami upload client
CAMI 2 Toy NCBI Taxonomy database
NCBI Taxonomy database as of 2018/02/26 to be used for CAMI 2 toy challenge datasets.