The pan-genome is presented in the form of a tube map
The expanded reference set will boost applications of genomics as the field increasingly moves from lab research to patient diagnosis and therapy © Darryl Leja/NHGRI

Scientists have compiled the “pan-genome”, a greatly expanded database of the biochemical letters that form individuals’ DNA, raising the prospect of improved diagnosis and treatment of genetic diseases and offering new insights into human diversity.

The first results of the global collaboration, combining the genomes of 47 people, showed significantly more variation in DNA between different people than scientists had previously appreciated, said Evan Eichler, professor of genome sciences at the University of Washington in Seattle.

The expanded reference set of genomes, sequenced more completely and accurately than ever before, does not yet represent the entire set of human genes but scientists say the project marks a big step forward in understanding genetic variations. They were published on Wednesday in a series of papers in Nature and other journals.

“With this new pan-genome reference, thousands of complex genetic variants previously too complicated to handle can now be included in studies to understand genetic risks for common diseases,” said Tobias Marschall, a participant from Heinrich Heine University in Düsseldorf.

Scientists proclaimed the completion of the first human genome sequence, reading the 3bn biochemical letters that store our genetic code, more than 20 years ago. But the first draft was far from complete, with large stretches of DNA — the molecules that contain genetic information — lying beyond the technological reach of scientists.

The pan-genome is presented in the form of a tube map, showing the varied routes that DNA sequences take as they undergo different mutations and transformations
The pan-genome is presented in the form of a tube map, showing the varied routes that DNA sequences take as they undergo different mutations and transformations. A, C, G and T are the four letters of the genetic code © Darryl Leja/NHGRI

Since then millions of “whole genomes” have been read with varying levels of accuracy and completion in commercial, clinical and research sequencing programmes — but without the thoroughness achieved with the “long read” technology used in the pan-genome project, which reaches previously inaccessible DNA but costs about 10 times more per genome.

At present scientists use one standard reference human genome as a point of comparison for genetic analysis. This comes mainly from a single person with contributions from about 20 others. As it is only 92 per cent complete it is seen as an insufficient representation of human diversity.

The pan-genome, in contrast, weaves together 47 genomes, each more than 99 per cent complete, in new graphical representations. One representation resembles a tube or subway route map, illustrating the many alternative routes a sequence takes in different genomes.

The project adds sequences containing 119mn previously unrecorded chemical letters — the “bases” represented as A, C, G and T — to the reference genome. These come mostly from large structural variations, which can transpose thousands of bases together, rather than individual mutations in single letters.

The expanded reference set will boost applications of genomics as the field increasingly moves from lab research to patient diagnosis and therapy.

“The interpretation of genome sequencing data is becoming increasingly important for routine clinical practice, it is crucial to transition to a reference that represents global genetic diversity and therefore reduces biases,” Marschall said.

The US National Human Genome Research Institute leads the pan-genome consortium with 14 other scientific bodies around the world. It aims to expand the number of genomes included to 350 by mid-2024 and eventually to reach 700.

“The number one goal of the pan-genome reference is to try to broaden the representation of a reference resource to be more inclusive and more equitable for studying the human species,” said Karen Miga, a project leader from the University of California Santa Cruz.

Copyright The Financial Times Limited 2024. All rights reserved.
Reuse this content (opens in new window) CommentsJump to comments section

Comments