Standardizing workflows in imaging transcriptomics with the abagen�toolbox

Page 1

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

1 of 27

Standardizing workflows in imaging

transcriptomics with the abagen�toolbox

Ross D Markello1*, Aurina Arnatkeviciute2, Jean- Baptiste Poline1, Ben D Fulcher3,

Alex Fornito2, Bratislav Misic1*

1McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University,

Montreal, Canada; 2School of Psychological Sciences & Monash Biomedical Imaging,

Monash University, Clayton, Australia; 3School of Physics, University of Sydney,

Sydney, Australia

Abstract Gene expression fundamentally shapes the structural and functional architecture of

the human brain. Open- access transcriptomic datasets like the Allen Human Brain Atlas provide

an unprecedented ability to examine these mechanisms in vivo; however, a lack of standardization

across research groups has given rise to myriad processing pipelines for using these data. Here,

we develop the abagen toolbox, an open- access software package for working with transcriptomic

data, and use it to examine how methodological variability influences the outcomes of research

using the Allen Human Brain Atlas. Applying three prototypical analyses to the outputs of 750,000

unique processing pipelines, we find that choice of pipeline has a large impact on research findings,

with parameters commonly varied in the literature influencing correlations between derived gene

expression and other imaging phenotypes by as much as ρ ≥ 1.0. Our results further reveal an

ordering of parameter importance, with processing steps that influence gene normalization yielding

the greatest impact on downstream statistical inferences and conclusions. The presented work and

the development of the abagen toolbox lay the foundation for more standardized and systematic

research in imaging transcriptomics, and will help to advance future understanding of the influence

of gene expression in the human brain.

Editor's evaluation

This paper will be of interest to scientists studying the large- scale transcriptomic organization of

the human brain, and in particular those who have used or plan to use the Allen Human Brain Atlas

dataset. The study is well- motivated and novel. The most striking finding is the magnitude of vari-

ability that is introduced by different data processing decisions. The open- source software described

in this study is comprehensive, well documented, and is an important contribution to the field.

Introduction

Technologies like magnetic resonance imaging (MRI) provide unique insights into macroscopic brain

structure and function in vivo. Modern research increasingly emphasizes how microscale attributes,

such as gene expression, influence these imaging- derived phenotypes (Fornito et�al., 2019; Arnat-

keviciute et�al., 2019; Arnatkevičiūtė et�al., 2021). Gene expression is particularly useful as it is a

fundamental molecular phenotype that can be plausibly linked to the function of biological pathways

(Whitaker et�al., 2016; Seidlitz et�al., 2018), protein synthesis (Zheng et�al., 2019), receptor distri-

butions (Beliveau et�al., 2017; N�rgaard et�al., 2021; Shine et�al., 2019; Deco et�al., 2020; Preller

et�al., 2018), and cell types (Hansen et�al., 2021; Anderson et�al., 2020b; Anderson et�al., 2018;

Seidlitz et�al., 2020; Gao et�al., 2020). However, researchers looking to bridge these macro- and

TOOLS AND RESOURCES

*For correspondence:

ross. markello@ mail. mcgill. ca

(RDM);

bratislav. misic@ mcgill. ca (BM)

Funding: See page 20

Preprinted: 09 July 2021

Received: 12 July 2021

Accepted: 15 November 2021

Published: 16 November 2021

Reviewing Editor: Saad Jbabdi,

University of Oxford, United

Kingdom

article is distributed under the

terms of the Creative Commons

Attribution License, which

permits unrestricted use and

redistribution provided that the

original author and source are

credited.

Page 2

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

2 of 27

microscopic phenotypes must overcome multiple challenges. Although there are numerous technical

and analytic considerations, one foundational issue is that acquiring high- quality transcriptomic data

from the human brain is both costly and highly invasive, requiring budgets far greater than most typical

neuroimaging studies and restrictive access to tissue from post- mortem donors or cranial surgical

patients. As such, researchers must often rely on freely available repositories of gene expression data.

There exist multiple open- access repositories for gene expression in the human brain, including

BrainSpan (Miller et�al., 2014; Kang et�al., 2011) and PsychENCODE (Gandal et�al., 2018; Li et�al.,

2018; Wang et�al., 2018; among others: Sousa et�al., 2017; Darmanis et�al., 2015; Lake et�al.,

2016); however, these datasets generally provide relatively sparse anatomical coverage, limiting the

types of analyses that can be performed. Thus, researchers who aim to compare transcriptomic expres-

sion with whole- brain imaging- derived phenotypes have primarily relied on the Allen Human Brain

Atlas (AHBA; Hawrylycz et�al., 2012; Hawrylycz et�al., 2015). Initially released in 2010, the AHBA

remains the most spatially comprehensive dataset of its kind. Derived from bulk microarray analysis

of tissue samples obtained from six donors, the AHBA provides expression data for more than 20,000

genes across 3702 brain areas in MRI- derived stereotactic space. With its superior resolution, the

AHBA has significantly contributed to the emergence of the field of imaging transcriptomics (Fornito

et�al., 2019), enabling dozens of studies over the past decade examining relationships between gene

expression and an array of macroscale imaging attributes, including cortical thickness (Shin et�al.,

2018), myelination (Burt et�al., 2018), developmental brain maturation (Whitaker et�al., 2016; Kirsch

and Chechik, 2016), structural brain networks (Seidlitz et�al., 2018; Romero- Garcia et�al., 2018;

Arnatkevičiūtė et�al., 2020), functional brain networks (Richiardi et�al., 2015; Krienen et�al., 2016;

V�rtes et�al., 2016), and human cognition (Fox et�al., 2014; Hansen et�al., 2021). The AHBA has also

highlighted the importance of whole- brain gene expression in neurological and psychiatric diseases,

where it has become increasingly clear that transcriptional pathways play a critical role in shaping the

broader dynamics of disease progression and emergent symptomatology (Zheng et�al., 2019; Shafiei

et�al., 2021; Henderson et�al., 2019; Vogel et�al., 2020; Rittman et�al., 2016; Anderson et�al.,

2020a; Romme et�al., 2017; McColgan et�al., 2018; Morgan et�al., 2019).

Since its release, several software toolboxes have been developed to help researchers use tran-

scriptional data from the AHBA (French and Paus, 2015; Gorgolewski et�al., 2015; Rittman et�al.,

2017; Rizzo et�al., 2016); however, these tools often focus primarily on facilitating integration of

the AHBA with neuroimaging data, offering limited if any functionality for modifying how the data

are processed prior to analysis. Instead, a recent comprehensive review revealed that many research

groups have opted to develop their own processing pipelines for the AHBA (Arnatkeviciute et�al.,

2019). Unfortunately, as there are no field- accepted standards for processing imaging transcriptomic

data, the generated pipelines vary substantially across groups.

The extent to which such processing variability affects analytic outcomes from the AHBA remains

unknown. Indeed, over the past decade neuroimaging research has shown that methodological vari-

ability can have broad influences on analyses using structural MRI (Bhagwat et�al., 2021; Khara-

bian Masouleh et�al., 2020), diffusion MRI (Oldham et�al., 2020; Maier- Hein et�al., 2017; Schilling

et�al., 2019), task fMRI (Carp, 2012; Botvinik- Nezer et�al., 2020), and resting- state fMRI (Parkes

et� al., 2018; Ciric et�al., 2017). Although researchers are beginning to grapple with the conse-

quences of this variability, the lack of baseline gene expression datasets against which to compare

new results impedes the development of standardized practices. In these situations, some researchers

have proposed performing ‘multiverse’ analyses (Steegen et�al., 2016; Dragicevic et�al., 2019),

wherein all possible permutations of data processing are analyzed and the full range of analytic results

reported. Although such analyses can be computationally intensive, they offer a path to understand

how processing choices impact statistical inferences and conclusions, and provide a mechanism by

which to help researchers converge on an optimal pipeline.

Here, we comprehensively investigate how different processing choices influence the results of

analyses using the AHBA. First, we develop an open- source Python toolbox, abagen, that collates all

possible processing parameters into a set of turn- key workflows, optimized for flexibility and ease-

of- use. We then use the toolbox to process the AHBA through approximately 750,000 unique pipe-

lines. Across three prototypical imaging transcriptomic analyses, we examine whether and how these

different processing options modify derived statistical estimates and quantify the relative importance

of each option. Next, we replicate a curated set of processing pipelines from the literature to assess

Page 3

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

3 of 27

how previously reported findings compare to the full range of potential outcomes observed across

all examined pipelines. Finally, we end with a set of recommendations, integrated directly into the

developed abagen toolbox, to promote standardized use of the AHBA in future work.

Results

We introduce the abagen toolbox, an open- access software package designed to streamline processing

and preparation of the AHBA for integration with neuroimaging data (Markello et�al., 2021c, available

at https:// github. com/ rmarkello/ abagen; Markello, 2021b copy archived at swh:1:rev:2aeab5bd0f-

147fa76b488645e148a1c18095378d). Supporting several workflows, abagen offers functionality for

Table 1. Abagen pipeline options.

Overview of 17 options to be considered when processing the AHBA data. The Choices column

indicates the number of parameters explored in the current report (numerator) and the total

number of parameters possible for the given option (denominator). A denominator of n indicates

a hypothetically near- infinite parameter space. The Description column gives a brief overview of

the processing choice; for more detail refer to the relevant section in Materials and methods: Gene

expression pipelines.

Option

Choices

Description

Volumetric or surface atlas

2/2

Whether to use a volumetric or surface

representation of the atlas

Individualized or group atlas

1/2

Whether to use individualized donor-

specific atlases or a group- level atlas

Use non- linear MNI coordinates

2/2

Whether to use updated MNI

coordinates provided by alleninf

package

Mirror samples across L/R hemisphere

3/4

Whether to mirror (i.e., duplicate)

samples across hemisphere boundary

Update probe- to- gene annotations

2/2

Whether to update probe annotations

Intensity- based filtering threshold

3/ n

Threshold for intensity- based filtering

of probes

Inter- areal similarity threshold

1/ n

Threshold for removing samples with

low inter- areal correspondence

Probe selection method

6/8

Method by which to select which

probe(s) should represent a given gene

Donor- specific probe selection

3/3

How specified probe selection should

integrate data from different donors

Missing data method

2/3

How to handle when brain regions are

not assigned expression data

Sample- to- region matching tolerance

3/ n

Distance tolerance for matching tissue

samples to atlas brain regions

Sample normalization method

3/10

Method for normalizing tissue samples

(across genes)

Gene normalization method

3/10

Method for normalizing genes (across

tissue samples)

Normalize only matched samples

2/2

Whether to perform gene normalization

for all versus matched samples

Normalizing discrete structures

2/2

Whether to perform gene normalization

within structural classes

Sample- to- region combination method 2/2

Whether to aggregate tissue samples in

regions within or across donors

Sample- to- region combination metric

2/2

Metric for aggregating tissue samples

into atlas brain regions

Page 4

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

4 of 27

an array of analyses and has already been used in several peer- reviewed publications and preprints

(Shafiei et�al., 2020; Hansen et�al., 2021; Shafiei et�al., 2021; Brown et�al., 2021; Park et�al.,

2021; Valk et�al., 2021; Zhao et�al., 2020; Benkarim et�al., 2020; Ding et�al., 2021; Park et�al.,

2020; Lariviere et�al., 2020; Martins et�al., 2021). The primary workflow, used to generate regional

gene expression matrices, integrates 17 distinct processing steps that have previously been employed

by research groups throughout the published literature (Table� 1). We refer to each unique set of

processing choices and parameters as a ‘pipeline’. The following results use abagen to investigate

how variable application of these processing steps can impact analyses of AHBA data.

Processing choices influence transcriptomic analyses

To understand how choices made during the processing of AHBA data impact downstream analyses,

we enumerated 17 decision points (i.e. processing steps or options) that have been modified and used

in the literature (Table�1). From these 17 steps we implemented 746,496 distinct processing pipelines,

where each pipeline parcellated microarray expression from the AHBA with the Desikan- Killiany atlas

(Desikan et�al., 2006) to generate a unique brain region- by- gene expression matrix.

Analyses of expression data from the AHBA can be grouped into one of three broad classes (Fornito

et�al., 2019): correlated gene expression analyses, gene co- expression analyses, and regional gene

expression analyses. Correlated gene expression analyses examine the correlation between brain

regions across genes, yielding a symmetric region � region matrix (similar to a functional connectivity

matrix). Gene co- expression analyses, on the other hand, examine the correlation between genes

across brain regions, yielding a symmetric gene � gene matrix. Finally, regional gene expression

analyses examine the expression patterns of specific genes or gene sets in relation to other imaging-

derived phenotypes.

To examine how differences in processing choices may impact both the expression matrices gener-

ated from the different pipelines and derived statistical estimates we ran one analysis from each of

these classes on the matrices generated by each processing pipeline. Notably, these analyses are

either direct reproductions or variations of analyses that have been previously published (Arnat-

keviciute et�al., 2019; Oldham et�al., 2008; Hawrylycz et�al., 2012; Burt et�al., 2018). Although

there is no ground truth for any of these analyses, findings from previous work offer some context

for interpreting the observed results (i.e. data from other species and other modalities; Lau et�al.,

2021). Nonetheless, we primarily focus on highlighting the potential variability resulting from different

processing pipelines.

Correlated gene expression (CGE)

First, we separately correlated the rows of each expression matrix to generate symmetric region �

region ‘correlated gene expression’ matrices, indicating the similarity of gene expression profiles

between different brain regions (Figure�1a). Previous work in other species has reliably observed that

transcriptional similarity in the brain decays with increasing separation distance (Fulcher et�al., 2019;

Lau et�al., 2021). This distance- dependent relationship is an expected feature due to the functional

specialization of brain regions, and is consistent with other imaging- derived phenotypes in humans

(Roberts et�al., 2016; Goulas et�al., 2019; Betzel and Bassett, 2018; Mišić et�al., 2014; Shafiei

et�al., 2020; Horv�t et�al., 2016). We assessed this relationship by extracting the upper triangle of

the correlated gene expression matrices and correlating them with the upper triangle of a regional

distance matrix, derived by computing the average Euclidean distance between brain region centroids

in the Desikan- Killiany atlas (Figure�1a, left panel). Although previous work has highlighted that this

relationship is exponential (Arnatkeviciute et�al., 2019), we computed the Spearman correlation as

both statistics should exhibit similar variability across pipelines and the latter is less computationally

expensive.

Gene co-expression (GCE)

For the second type of analysis we separately correlated the columns of each expression matrix to

generate gene � gene ‘co- expression’ (GCE) matrices, indicating the similarity in spatial expression

patterns between all pairs of genes (Figure�1a). A significant body of research has shown that genes

tend to form functional communities, exhibiting synchronized expression patterns across space and

time (Oldham et�al., 2008), such that gene co- expression patterns tend to be more similar within than

Page 5

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

5 of 27

Figure 1. Processing choices influence transcriptomic analyses. (a) Examples of the three analyses used to assess differences in gene expression

matrices generated by transcriptomic pipelines. First row: a depiction of the region- by- gene expression matrix generated from one of the 746,496

tested processing pipelines. Second row, left: we compute the correlation between rows of each matrix to generate a symmetric region � region CGE

matrix. We then compute the correlation between the upper triangle of this CGE matrix and the upper triangle of a regional distance matrix to examine

the degree to which CGE decays with increasing distance between regions (Arnatkeviciute et�al., 2019). Second row, middle: we compute the

Euclidean distance between columns of each matrix to generate a gene � gene GCE matrix. We use previously defined functional gene communities

(Oldham et�al., 2008) to compute a silhouette score for this GCE matrix to investigate whether genes within a module have more similar patterns of

spatial expression than genes between modules. Second row, right: the first principal component is extracted from the RGE matrix. We compute the

correlation between this principal component and the whole- brain T1w/T2w ratio (Burt et�al., 2018) to understand how closely these maps covary

across the brain. (b) The full statistical distributions from each of the three analyses for all 746,496 pipelines. Left panel: Spearman correlation values,

ρ, from the CGE analyses. Middle panel: silhouette scores from the GCE analyses. Right panel: Spearman correlation coefficients, ρ, from the RGE

analyses. CGE: correlated gene expression; GCE: gene co- expression; RGE: regional gene expression.

Page 6

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

6 of 27

between such communities. Here, we obtained a set of gene community assignments derived for the

brain from a previously studied human transcriptomic dataset (Oldham et�al., 2008). We used these

community assignments to calculate a silhouette score (Rousseeuw, 1987) for the gene co- expression

matrices generated by each pipeline, measuring how well these communities represented the derived

co- expression patterns (Figure�1a, middle panel).

Regional gene expression (RGE)

For the third type of transcriptomic analysis, we focused on regional correlations between gene

expression measures and an MRI- derived phenotype. Our regional expression measure was defined

by computing the first principal component of the region- by- gene expression matrix, representing

the axis of maximum spatial variation of gene expression in the brain observed under a given AHBA

processing pipeline. As gene expression fundamentally shapes the structure and function of the

human brain, it is likely that this principal component may exhibit similar spatial variability to other

imaging- derived measures. Recent work has highlighted that the T1w/T2w ratio is a robust pheno-

type that exhibits patterns of regional variation consistent with other microstructural and functional

properties (Gao et�al., 2020; Burt et�al., 2018; Demirtaş et�al., 2019; Fulcher et�al., 2019). We

therefore correlated the first principal component of gene expression with the whole- brain T1w/T2w

ratio (Figure�1a, right panel), measuring the extent to which these values covary across the cortex.

Pipeline distributions

Results from these three analyses reveal that choice of processing pipeline dramatically influences

derived statistical estimates (i.e. the CGE- distance correlation, the gene co- expression silhouette

score, and the spatial correlations between gene PC1 and whole- brain T1w/T2w ratio; Figure�1b).

We observe that all three of the generated distributions of statistical estimates across the 746,496

pipelines have wide ranges (correlated gene expression: [-0.51,–0.13]; gene co- expression: [-0.78,–

0.18]; regional gene expression: [0.00, 0.90]) and are either bimodal (Figure�1b, left/middle panels) or

heavily skewed (Figure�1b, right panel).

Since there is no ground truth for these analyses we cannot quantitatively assess whether some

pipelines are more or less accurate than others. However, there is strong qualitative evidence to

suggest that correlated gene expression should be lower between brain regions that are farther apart

(Arnatkeviciute et�al., 2019; Krienen et�al., 2016; Richiardi et�al., 2015; Fulcher et�al., 2019; Lau

et�al., 2021). It is notable, then, that the distribution of distance- dependent estimates is so strongly

bimodal (splitting at r ≈ −0.4), suggesting two very different perspectives on the size of this effect

(Figure�1a and b, left panels). As increasingly- detailed single- cell transcriptional data become avail-

able (e.g. Yao et�al., 2021) we may be able to use these estimates to determine accuracy; for now,

we simply note that even for this estimate with strong biological priors we see considerable variability.

Similar variability can be observed for the other two analyses. While all the pipelines demonstrate

relatively poor fit of gene communities to the derived gene co- expression matrices (refer to Materials

and methods: Analytic approaches for information on why this is not unexpected), we observe that

a portion of the pipelines yield far worse correspondence (Figure�1a and b, middle panels). More-

over, while the correlations between gene PC1 and whole- brain T1w/T2w ratio are largely consistent

across pipelines, there are a small group of pipelines that yield correlations that deviate by ρ ≈ 1.0.

Notably, the parameter choices for these pipelines are not pathological—that is, their use could be

justified—and, as we discuss later (see Results: Variability in parameter importance), modifying just

one parameter setting can yield changes in effect sizes within this range.

Collectively, we find that for all three of these analyses there is substantial variability in the statis-

tical estimates generated by different processing pipelines, and this variability is large enough that,

across pipelines, it has a meaningful difference in the potential inferences and conclusions that can

be drawn.

Variability in parameter importance

Next, we quantified the relative importance of different processing steps and parameters on our three

derived statistical estimates. While researchers must ultimately make choices for each of the steps

individually when processing AHBA data, we wanted to investigate whether unique choices have

Page 7

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

7 of 27

distinct influences. Moreover, which parameters are most important may differ based on the type of

analysis performed.

We investigated parameter importance by calculating a distribution of difference scores for each

parameter, measuring the extent to which changing each parameter—holding all other parameters

constant—influences the derived statistical metrics from each of the three analyses. For example,

given a processing parameter with two choices this procedure yielded a distribution of N/2 difference

scores per analysis, where N is the total number of pipelines (i.e. 746, 496/2 = 373, 248). We averaged

these distributions separately for each analysis to generate a single, summary ‘impact score’ for each

processing step, which we then rank- ordered independently for each analysis.

We find considerable agreement in which parameters are the most impactful across analyses

(Figure�2a): the most influential processing steps often involve procedures that influence the gene

normalization process in some way (e.g. gene normalization method, normalizing only matched

samples; Figure�2b). On the other hand, among the least impactful parameters are choices concerning

donor- specific probe selection and handling of missing data. It is worth noting that of the probe selec-

tion methods tested in the current manuscript (i.e. max intensity, correlation intensity, correlation vari-

ance, differential stability, RNAseq correlation, and averaging), three of the six all render the choice of

donor- specific probe selection redundant. In other words, these three methods are mutually exclusive

with choice of donor- specific probe selection, potentially confounding our ability to measure the real

Figure 2. Parameter choice differentially impacts statistical estimates. (a) Rank of the relative importance for each parameter ( y- axis) across all three

analyses ( x- axis). Warmer colors indicate parameters that have a greater influence on statistical estimates. (b) Statistical distributions from the three

analyses, shown as kernel density plots, separated by choice of gene normalization method (the most impactful parameter as shown in panel a). (c)

Density plots of the statistical estimates for all 746,496 pipelines shown along the first two principal components, derived from the 746,496 (pipeline)

x 3 (statistical estimates) matrix, representing how different the statistical estimates from each of the three analyses are relative to other pipelines. Left

panel: pipelines are colored based on choice of gene normalization method, where each color represents 1/3 of the pipelines. Here, the pipelines in

which no normalization was applied (purple) are distinguished from those in which some form of normalization was applied (blue and brown). Right

panel: pipelines are colored based on whether gene normalization was performed within (True, red) or across (False, purple) structural classes (i.e.

cortex, subcortex/brainstem, cerebellum; see Materials and methods: Gene expression pipelines for more information).

Page 8

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

8 of 27

influence of this parameter. We also highlight that choice of atlas may influence the impact of missing

data handling: since the Desikan- Killiany atlas is a relatively low- resolution atlas (68 nodes), expres-

sion matrices generated from the tested pipelines are missing, at most, data for two brain regions.

It is possible that handling of missing data may be more important when higher- resolution parcella-

tions are employed. That is, while some parameters do not appear to affect our results in aggregate,

there are potentially specific research questions where these parameters could play an important and

impactful role.

To investigate those parameters that did play an influential role in the current analyses, we visu-

alized their impact by examining the statistical distributions from each analysis separated by the

different parameter choices (shown in Figure�2b for gene normalization method). Dividing the distri-

butions in this way highlights how strongly parameter choice can influence the outcomes of the anal-

yses: for example, when no gene normalization is employed the resulting estimates are dramatically

shifted from those generated by pipelines that employed some form of normalization (Figure�2b; no

normalization: purple distribution). Indeed, the bimodality and skew observed in the full statistical

distributions for the analyses (Figure�1b) is almost entirely explained by this single parameter choice.

To investigate more qualitative differences in how parameter choice influences the processing

pipelines we performed a principal component analysis (PCA) on the matrix of statistical estimates

from the three analyses (i.e. the 746, 496 � 3 pipeline- by- analysis matrix). We extracted the first two

principal components from the statistical estimate matrix (variance explained: PC1 = 70%, PC2 = 26%)

and examined how pipeline scores were distributed along these axes (Figure� 2c). Delineating the

distribution of pipelines based on parameter choice underscores how these options impact the sepa-

rability of resulting statistical estimates. Reinforcing results presented above, we find that the choice

of gene normalization method distinguishes the one- third of pipelines with no normalization (purple)

from the remaining two- thirds that applied some form of normalization (blue and brown; Figure�2c,

left). It is clear from the distribution of pipelines, however, that other processing choices interact with

this parameter. For example, plotting the pipelines by whether the gene normalization was performed

separately on samples within each structural class (i.e. cerebral cortex, subcortex, cerebellum) rather

than across all tissue samples further delineates the pipelines that applied gene normalization into two

distinct clusters (Figure�2c, right).

These results reveal how different processing steps are grouped in terms of their importance to

analyses of the AHBA, with some groups demonstrating greater potential impact. Broadly, parame-

ters modifying normalization are the most important, followed by parameters influencing how tissue

samples are matched to brain regions, and finally parameters impacting probe selection. Moreover,

we find that choices within each processing step do not all have an equivalent impact on derived esti-

mates (i.e. performing no gene normalization has a much greater influence than choosing between

the two other forms of normalization tested).

Reproducing published analyses

The previous subsections demonstrate variability across the complete range of reasonable processing

pipelines; however, many of these pipelines have not yet been used in practice. To investigate whether

the subset of pipelines that have already been implemented in the published literature display similar

variability, we used abagen to reproduce the processing procedures from nine peer- reviewed articles

that (1) are highly- cited within the field, (2) highlight a wide range of processing options, and (3)

sufficiently describe their processing pipelines such that they could be reproduced. We explored how

different the gene expression values and statistical outcomes generated by these published pipelines

were (Hawrylycz et�al., 2015; French and Paus, 2015; Whitaker et�al., 2016; Krienen et�al., 2016;

Anderson et�al., 2018; Burt et�al., 2018; Romero- Garcia et�al., 2018; Anderson et�al., 2020b;

Liu et�al., 2020). To ensure comparability, we standardized the choice of brain parcellation across

pipelines, using the Desikan- Killiany atlas in all instances. The pipelines were used to generate nine

region- by- gene expression matrices, which were then subjected to the same three analyses described

previously.

In reproducing the pipelines we note important differences in processing parameter selection

(Figure� 3a), and find that this variability results in slight discrepancies between gene expression

values generated by the pipelines. For example, looking at the distribution of cortical somatostatin

(SST), a gene discussed heavily in Anderson et�al., 2020b where it used as a proxy for somatostatin

Page 9

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

9 of 27

interneuron density (Fulcher, 2019), we observe some variation between pipelines (Figure�3b and

c). Although we find moderate consistency in the statistical estimates generated by the pipelines,

there are important differences (ranges: correlated gene expression [-0.49,–0.28], gene co- expression

[-0.70,–0.24], regional gene expression [0.34, 0.88]; Figure�3c). One outlier is the single pipeline that

did not appear to implement any form of gene normalization (French and Paus, 2015), supporting

earlier results demonstrating the importance of this processing step on downstream expression esti-

mates. This is potentially notable as the processed expression data from this pipeline were made

openly available and have been used in analyses by other researchers (e.g. Sepulcre et�al., 2018;

Beliveau et�al., 2017).

Given that imaging transcriptomics is still relatively new and there has been limited work addressing

best practices in the field (Arnatkeviciute et�al., 2019), these results stress the importance of stan-

dardization in use of the AHBA among research groups. Although variation in processing can osten-

sibly lead to similar inferences in specific analyses, even minor differences in processing choices

Figure 3. Reproducing published pipelines. (a) Parameter choices used in the reproduction of published pipelines. Processing steps with categorical

choices (e.g., gene normalization) were converted to numerical choices for display purposes only. These choices reflect the range of choices

enumerated in Table�1. (b) Relative expression values of cortical somatostatin (SST) generated by each of the reproduced pipelines. Value ranges vary

based on pipeline processing options. (c) The Pearson correlation between the cortical somatostatin (SST) maps generated by the nine pipelines shown

in panel (b). (d) Statistical estimates from the three analyses described in Materials and methods: Analytic approaches applied to expression data from

each of the published pipelines.

Page 10

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

10 of 27

consistently yield measurable discrepancies in derived expression data. Without proper standardiza-

tion, these differences will compound and become more problematic as the field continues to grow.

Standardized processing and reporting with the abagen toolbox

Across all of our analyses we find that choice of processing steps and parameters can have a strong

influence on the statistical outcomes of research with the AHBA. Here, we briefly highlight features

that we have integrated into the abagen toolbox to facilitate standardization in future research.

The abagen toolbox supports two use- case driven workflows: (1) a workflow that accepts an atlas

and returns a parcellated, preprocessed regional gene expression matrix (Figure�4a); and, (2) a work-

flow that accepts a mask and returns preprocessed expression data for all tissue samples within the

Figure 4. Workflows and features in the abagen toolbox. (a) The primary workflow of abagen, used in the reported analyses, accepts a brain atlas and

returns a parcellated brain- region- by- gene expression matrix. (b) An alternative abagen workflow accepts a regional mask and returns a processed

tissue- sample- by- gene expression matrix, for all tissue samples from the six AHBA donors that fall within boundaries of the mask. (c) Examples of

selected features from the abagen workflows and additional toolbox functionality. Top left: examples of some commonly- used atlases that can be

employed with the parcellation workflow shown in panel (a). Bottom left: abagen can accept either standard atlases (i.e. in MNI space) or atlases defined

in the space of the six individual donors from the AHBA. Top right: an additional workflow available in abagen can be used to generate densely-

interpolated expression maps from AHBA data using a k- nearest neighbors interpolation algorithm. Bottom right: using high- resolution atlases in the

parcellation workflow (panel a) may result in some parcels being assigned no expression data; abagen supports two methods for assigning values to

such regions.

Page 11

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

11 of 27

mask (Figure�4b). Workflows can be called via a single line of code from either the command line or

Python terminal, and take approximately one minute to run with default settings using the Desikan-

Killiany atlas. The main output of abagen is a single brain region (or tissue sample) � gene expression

matrix. Changing the parameters may modify the shape of the matrix (e.g. different atlases will yield

different numbers of regions or samples) or different values (e.g. different processing choices may

yield different numbers of genes), but not the structure. The outputs of these workflows can be used

generally to examine the three prototypical research questions enabled by the AHBA: correlated gene

expression, gene co- expression, and regional expression of genes of interest more broadly (Fornito

et�al., 2019). Beyond its primary workflows, abagen has additional functionality for post- processing

the AHBA data (e.g. removing distance- dependent effects from expression data, calculating differ-

ential stability estimates; Hawrylycz et�al., 2015), and for accessing data from the companion Allen

Mouse Brain Atlas (e.g. providing interfaces for querying the Allen Mouse API; https:// mouse. brain-

map. org/; Lein et�al., 2007).

Although these workflows support the entire range of processing options that we assessed in

the current manuscript (Figure�4c), we have set the default options for all steps based on best prac-

tice recommendations developed in Arnatkeviciute et�al., 2019 and further informed by the results

Figure 5. Annotated example abagen report. Example of an automatically generated methods section report from the abagen toolbox. Processing

steps are shown on the left and the relevant methods text—which is updated when these steps are modified—is shown in the same font color on the

right. Reports also include a formatted reference section and relevant equations; these are not shown here for conciseness. Note that some processing

steps (e.g. normalizing within structures, missing data handling) are omitted here because they are not run by default (see Supplementary file 1).

Page 12

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

12 of 27

presented above (see Supplementary file 1 for a full list). We believe the default settings in abagen

will provide a reasonable starting point for researchers beginning to work with the AHBA; however, as

we have continually noted, the appropriate choices for some parameters will vary based on research

question. As such, to make it easier for researchers to report exactly what parameters they use, we

have integrated an automated reporting mechanism into the abagen workflows (Figure�5). The gener-

ated reports provide manuscript- ready step- by- step documentation describing all the processing

done to the AHBA data in the workflow, and are licensed CC0 (https:// creativecommons. org/ share-

your- work/ public- domain/ cc0/) so that they can be freely used without restriction.

Creation of the toolbox has followed best- practices in software development, including version

control, continuous integration testing, and modular code design. To encourage further use by new

research groups we provide comprehensive documentation on installing and working with the abagen

toolbox online (https:// abagen. readthedocs. io/).

Discussion

In the present report, we introduced the abagen toolbox, an open- source Python library for processing

transcriptomic data. Using abagen, we conducted a comprehensive analysis examining whether and

how different processing options modify statistical estimates derived from analyses using the AHBA.

We investigated how processing pipelines used in the literature compare to those we tested, and

provide recommendations for improving standardization and reporting of analyses using the AHBA,

highlighting how the abagen toolbox can facilitate future developments in this space.

Testing nearly 750,000 unique processing pipelines, we find that choice of processing parameters

can strongly influence statistical estimates derived from analyses of the AHBA, and that these choices

interact with the type of analysis performed (Figure�1). We observe significant variability with regard

to which parameters are most influential, finding that procedures modifying gene expression normal-

ization have a far greater impact on downstream analyses than other processing steps (Figure� 2).

Looking to the literature, we reproduce nine pipelines from published articles and find that, despite

notable inconsistencies in their processing choices, there is moderate consistency in their produced

statistical estimates (Figure�3). We demonstrate, however, that these summary estimates may obscure

meaningful differences in gene expression values derived by the pipelines, cautioning researchers to

be aware of how analytic choices may impact their findings.

Altogether, the present report provides a comprehensive assessment of how processing variability

can impact analyses in the field of imaging transcriptomics. Our results demonstrate how researcher

choices (or ‘researcher degrees of freedom’; Simmons et�al., 2011) can play a meaningful role in anal-

yses of the AHBA. However, these findings are not necessarily limited to the AHBA. Indeed, increasing

reliance on open- access datasets has begun to reveal unique challenges associated with data reuse

(Thompson et�al., 2020). Improved standardization and reporting among research groups using (and

re- using) openly available datasets may help to mitigate some of these challenges. We believe that

functionality in the abagen toolbox can support future researchers in overcoming these pitfalls and

improve reproducibility in processing and analyzing AHBA data.

Our results also show that not all processing choices are equal: that is, we find a hierarchy of

processing parameters, wherein procedures modifying gene normalization have the greatest impact

on analyses, followed by steps more broadly influencing the matching of tissue samples to brain

regions and finally by parameters that determine probe selection. Furthermore, we find that within

processing steps certain parameter choices may lead to more reasonable statistical estimates. In

particular, applying some form of gene normalization tends to improve the behavior of processed

expression data when compared to instances in which no normalization is applied (Figure� 1), but

there appear to be limited differences in the type of normalization used. Although we only considered

cortical tissue samples in the current analyses, we expect that including non- cortical samples would

further reinforce these results (Arnatkeviciute et�al., 2019) known differences in microarray expres-

sion values between cortex and subcortical structures will likely emphasize the impact of different

normalization procedures across pipelines. Critically, these findings largely agree with previous

recommendations developed by Arnatkeviciute et�al., 2019, and we have chosen default parameter

choices for abagen workflows accordingly.

Note that there are some processing steps that should be performed in a specific sequence, and

others whose order could potentially be interchanged. For example, intensity- based filtering of probes

Page 13

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

13 of 27

must always be performed before probe selection—reversing the order of these operations would,

in the majority of cases, be problematic because it would potentially result in the selection of noisy

probes to be carried through to analysis. However, the order of other steps (i.e. sample versus gene

normalization) could arguably be reversed with no ostensible detriment. This procedural ambiguity is

a salient example of the need to standardize workflows.

More broadly, this work builds on increasing efforts to examine the importance of methodological

choices and analytical flexibility in human neuroimaging research (Bhagwat et�al., 2021; Kharabian

Masouleh et�al., 2020; Oldham et�al., 2020; Maier- Hein et�al., 2017; Schilling et�al., 2019; Carp,

2012; Botvinik- Nezer et�al., 2020; Parkes et�al., 2018; Ciric et�al., 2017). Thankfully, emerging

technical solutions have begun to tackle these issues via the development of tools that aim to abstract

away sources of variation (e.g. fMRIPrep, Esteban et�al., 2019; QSIPrep, Cieslak et�al., 2020). While

results from the present study reinforce the importance of methodological choices in research, abagen

draws significant inspiration from these software packages in providing a set of tools designed to

overcome such concerns when working with the AHBA.

While the AHBA dataset remains the only one of its kind, the abagen toolbox is designed to be used

more broadly as similar datasets become available. That is, the preprocessing functions in abagen can

be applied to other microarray expression datasets assuming, for example, availability of stereotactic

coordinates. As new imaging transcriptomic datasets are developed and become more widely used,

abagen functionality for creating standardized processing pipelines will only become more important.

By developing the toolbox openly on GitHub (https:// github. com/ rmarkello/ abagen), it is our hope

that abagen can serve as a foundational, community tool for use in imaging transcriptomics research.

One consideration for future work on this topic is that the pipelines tested cover only a portion

of the potential variability possible when processing AHBA data (Table�1). For example, a growing

body of research has begun to examine how choice of brain parcellation may impact imaging analyses

(e.g. Craddock et�al., 2012; Thirion et�al., 2014; Mess�, 2020; Markello and Misic, 2021). While

we only assessed processing pipelines using the Desikan- Killiany atlas, many other atlases have been

used with the AHBA and it remains unclear how this variation may impact research findings. We also

did not investigate whether donor- specific parcellations may impact analyses, a processing choice

used in several published research findings (Anderson et�al., 2020b; Romero- Garcia et�al., 2018;

Burt et�al., 2018). Although there is significant evidence suggesting inter- individual variability in brain

region definition (e.g. Gordon et�al., 2017; Kong et�al., 2019; Dickie et�al., 2018), the process

of generating individualized brain parcellations is fraught with methodological choices and requires

careful data processing. Given the quality of the MRI data provided alongside the transcriptomic

data in the AHBA—including important differences in scanning protocol and procedures between

donors—creating donor- specific parcellations may be a large source of variability between pipelines.

Another limitation of the presented results is that we are unable to make categorical statements

about which processing options are ’best’ for the AHBA. First, there is no ground truth against which

one can assess what the optimal set of processing parameters. One potential solution to this could be

to examine the robustness of pipelines based on a leave- one- donor- out strategy (e.g. Arnatkeviciute

et�al., 2019; Vogel et�al., 2020), wherein analyses are repeated six times, omitting one donor each

time, to ensure that none of the donors are unduly influencing analytic estimates. This approach is

likely to become more useful as data from more individuals becomes available, but at present may be

a worthwhile approach for assessing whether chosen processing parameters are appropriate. More-

over, the optimal set of processing parameters may vary based on research question. For instance, in

most applications gene normalization is appropriate, as it ensures that downstream analyses are not

driven by a small subset of highly expressed genes. However, in other applications it may be desirable

to retain the variance contributed by genes to accurately reflect their relative expression levels. For

example, many genes in AHBA are not brain- specific, so normalization will amplify their expression

patterns, potentially obscuring more relevant expression information. This can be avoided by sub-

selecting genes in a hypothesis- driven manner and skipping the normalization step altogether.

Nonetheless, we offer two alternative solutions for researchers who want to continue using the

AHBA data. First, similar to the current report, researchers can conduct a comprehensive analysis

with the AHBA, running multiple processing pipelines and showing the entire distribution of gener-

ated statistical estimates; however, this process can be computationally prohibitive and may impair

researchers’ abilities to interpret their findings (Steegen et�al., 2016). A less costly alternative, then,

Page 14

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

14 of 27

is for the imaging transcriptomic research community to converge on a set of data- driven processing

pipeline for the AHBA that can be used across research groups. We believe the abagen toolbox—

with its comprehensive workflows, well- informed default parameter choices, and detailed documen-

tation—can facilitate this process. While we acknowledge that some research groups may have strong

reasons for wanting to use specific (i.e. non- default) processing choices, in these instances we urge

clear and detailed reporting of the methods used—such as via the automated reporting functionality

from the abagen toolbox.

Altogether, the current report highlights the problem of processing variability in analyses using

the AHBA, impacting many research studies in the burgeoning field of imaging transcriptomics. We

demonstrate how different processing options can influence statistical estimates of analyses relating

data from the AHBA to imaging- derived phenotypes, and present the abagen toolbox as a promising

potential solution to this issue.

Materials and methods

Code and data availability

All code used for data processing, analysis, and figure generation is available on GitHub (https://

github. com/ netneurolab/ markello_ transcriptome; Markello, 2021a copy archived at swh:1:rev:3abb-

c85596a5baacd93e5e9e56c906c9dbb080f3)and directly relies on the following open- source Python

packages: IPython (Perez and Granger, 2007), Jupyter (Kluyver et�al., 2016), Matplotlib (Hunter,

2007), NiBabel (Brett et�al., 2019), NumPy (Oliphant, 2006; van der Walt et�al., 2011; Harris et�al.,

2020), Pandas (McKinney, 2010), PySurfer (Waskom et�al., 2020), Scikit- learn (Pedregosa et�al.,

2011), SciPy (Virtanen et�al., 2020), and Seaborn (Waskom et�al., 2018).

Data

Allen human brain atlas

The Allen Human Brain Atlas (AHBA) is an open- access online resource containing whole- brain

microarray gene expression data obtained from post- mortem tissue samples of six adult human

donors (https:// human. brain- map. org; Allen Institute for Brain Science, 2013; Hawrylycz et�al.,

2012). Expression data for over 20,000 genes were sampled from 3702 distinct tissue samples across

the six donors (one female, ages 24–57), providing the most spatially comprehensive assay of gene

expression in the human brain. Normalized microarray expression data were downloaded for all six

donors; RNAseq data were downloaded for the two donors with relevant data.

Human connectome project

Group- averaged T1w/T2w (a proxy for intracortical myelin) data were downloaded from the S1200

release of the Human Connectome Project (HCP; Van Essen et�al., 2013) and used without further

processing.

Brain parcellations

All analyses were performed with the Desikan- Killiany atlas (DK; 68 cortical nodes), an anatomical

parcellation generated by delineating regions based on gyral boundaries (Desikan et�al., 2006). To

explore the impact of volumetric- versus surface- based parcellations we used a version of the DK atlas

in (1) volumetric MNI152, and (2) surface fsaverage5 space; both versions are provided directly with

the abagen toolbox. To facilitate comparison between volumetric- and surface- based parcellations,

samples from the cerebellum, subcortex and brainstem were omitted.

The abagen toolbox

Source code for abagen is available on GitHub (https:// github. com/ rmarkello/ abagen) and is provided

under the three- clause BSD license (https:// opensource. org/ licenses/ BSD- 3- Clause). We have inte-

grated abagen with Zenodo, which generates unique digital object identifiers (DOIs) for each new

release of the toolbox (e.g. https:// doi. org/ 10. 5281/ zenodo. 3451463). Researchers can install abagen

as a Python package via the PyPi repository (https:// pypi. org/ project/ abagen/), and can access

comprehensive online documentation via ReadTheDocs (https:// abagen. readthedocs. io/).

Page 15

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

15 of 27

Gene expression pipelines

Most neuroimaging analyses using the AHBA must first convert the ‘raw’ data into a pre- processed

brain region- by- gene expression matrix. To investigate the extent to which different processing proce-

dures might impact downstream analyses, we used abagen to modify 17 distinct processing steps in

the generation of region- by- gene matrices from the original AHBA data. Each unique set of these 17

processing choices and parameters constitutes a pipeline, yielding 746,496 unique pipelines. Here,

we describe in detail the 17 processing steps and respective methods for each option that we exam-

ined in our analyses (refer to Table�1 for a summary overview of these choices or refer to the abagen

documentation for implementation details; https:// abagen. readthedocs. io).

Volumetric or surface atlas

Aggregation of tissue samples from the AHBA into discrete brain regions requires researchers to

supply an atlas (or parcellation). There are many brain atlases available for use; however, they typically

exist in one of two forms: defined (1) in 3D ‘volumetric’ space, or (2) in ‘surface’ space on a 2D repre-

sentation of the cortical sheet. Many atlases can exist in both of these formats and so beyond the

choice of parcellation, researchers must select which representation to use when processing AHBA

samples. Choice of atlas may impact how many and which samples are matched to brain regions. In

the current manuscript, we examined a volume- and surface- based representation of the Desikan-

Killiany atlas (see Materials and methods: Data; Desikan et�al., 2006). Note that both versions of the

atlas used in the reported analyses are included with the abagen software distribution.

Individualized or group-level atlas

There is growing recognition that brain parcellations derived at the group level tend to obscure indi-

vidual differences in anatomy or function (e.g. Gordon et�al., 2017; Kong et�al., 2019; Dickie et�al.,

2018). Researchers working with the AHBA have thus begun to generate donor- specific parcellations,

using individualized atlases to match tissue samples to brain regions. The individualization process

can vary dramatically depending on whether researchers are using volumetric or surface atlases and

whether they are operating in ‘native’ or standard (i.e. group) space. Because of the immense vari-

ability inherent to the individualization process itself, we opted not to explore this parameter in the

current manuscript.

Use non-linear MNI coordinates

With its initial release the AHBA provided stereotactic coordinates for each tissue sample in MNI

space (Fonov et�al., 2009; Fonov et�al., 2011; Collins et�al., 1999); however, two of the six donor

brains were scanned in cranio and coordinates were derived using affine registrations to the MNI

template, while the remaining four were scanned ex vivo and a non- linear registration was used to

generate coordinates. More recently, Gorgolewski et�al., 2014 used ANTS (Avants et�al., 2011) to

perform a standardized, manually corrected non- linear diffeomorphic registration of all the donor

brains to MNI space. Analyses collating tissue samples into distinct brain regions often rely on MNI

coordinates to match samples to regions, and researchers must choose whether to use the original

coordinates provided with the AHBA or the newer, non- linearly generated coordinates. In the current

manuscript, we assessed the impact of using (1) the original MNI coordinates and (2) the updated

coordinates from Gorgolewski et�al., 2014.

Mirror samples across left-right hemisphere

Only the first two donors included in the AHBA had tissue samples taken from the right hemisphere.

Preliminary analyses of these data revealed minimal lateralization of microarray expression, and so

samples were collected exclusively from the left hemisphere for the following four donors (Hawrylycz

et�al., 2012; Hawrylycz et�al., 2015). This irregular sampling resulted in limited spatial coverage of

expression in the right hemisphere; to resolve this, some researchers have opted to mirror existing

tissue samples across the left- right hemisphere boundary (Romero- Garcia et�al., 2018). Researchers

must decide whether to perform sample mirroring, and, if so, whether they should mirror unilaterally

(i.e. only right- to- left or left- to- right) or bilaterally (i.e. both right- to- left and left- to- right). In the current

manuscript, we assessed (1) no mirroring, (2) left- to- right mirroring, and (3) bilateral mirroring. The

Page 16

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

16 of 27

option for mirroring right- to- left was omitted as this is only useful when analyses selectively consider

the left hemisphere, not the whole brain.

Update probe-to-gene annotations

The 60- base- pair probes used to assess microarray expression in the AHBA were annotated with

their corresponding gene (or lack thereof) when the data were publicly released. However, as the

human reference genome is updated these annotations become increasingly out- of- date. Thus, when

researchers choose to use the AHBA data they must decide whether to use the original gene annota-

tions or more recently- generated annotations. In the current manuscript, we assessed using both the

original annotations and those generated by Arnatkeviciute et�al., 2019.

Intensity-based filtering threshold

Data from the AHBA are provided with information indicating whether the expression of each microarray

probe exceeds the expression levels of background signal. Using this information, researchers can

choose to perform an intensity- based filtering procedure wherein probes are only considered if their

expression levels are greater than background across a specified percentage of tissue samples. In the

current manuscript, we considered three degrees of intensity- based filtering: (1) no filtering (all probes

used), (2) 25�% filtering (probes used if they exceeded background for more than 25�% of all samples),

and (3) median filtering (probes used if they exceeded background for more than 50�% of all samples).

Inter-areal similarity threshold

The expression value of some tissue samples in the AHBA differ markedly from all other samples in the

dataset. While this could be driven by real spatial variability in expression values throughout the brain,

it is also possible that this variability is artifactual. Researchers can opt to assess the inter- areal simi-

larity of tissue samples, quantifying those that differ from the rest by a given threshold, and remove

them from consideration. To our knowledge, this processing step has only been implemented in a

single research study (Burt et�al., 2018), and as such we do not consider it in the current manuscript.

Probe selection method

The probes used to measure microarray expression levels in the AHBA are often redundant; that is,

there are frequently several probes indexing the same gene. Thus, at some point researchers must

transition from measuring probe expression levels to measuring gene expression levels. Effectively,

this means selecting from or condensing the redundant probes for each gene. There have been at

least eight methods proposed in the literature for this process, including selecting a single probe with

the (1) max intensity across samples, (2) max variance across samples, (3) highest loading on the first

principal components across samples, (4) highest correlation to other probes (or max intensity across

samples when only two probes exist), (5) highest correlation to other probes (or max variance across

samples when only two probes exist), (6) highest differential stability across donors, (7) highest fidelity

to simultaneously- acquired RNAseq data, or (8) simply averaging all probes indexing the same gene.

In the current manuscript we only consider six of the most commonly- applied methods (i.e. 1, 4, 5, 6,

7, and 8); the other methods (i.e. 2 and 3) have only been reported in a single research study (Negi

and Guda, 2017 and Parkes et�al., 2017, respectively) and as such we do not consider them.

Donor-specific probe selection

Probe selection (described above) often requires applying some selection criterion to gene expres-

sion levels across tissue samples. For these methods, the specified criterion can be measured across

donors (i.e. aggregating tissues samples from donors) or independently for each donor. The latter

case—performing probe selection independently for each donor—allows for two additional options:

(1) using whichever probe is chosen for each donor, even if it differs from the other donors, or (2)

using the most- commonly selected probe for all donors. In the current manuscript, we considered all

three of these options: (1) aggregating samples across donors, (2) performing probe selection inde-

pendently for each donor, and (3) using the most commonly- selected probe across donors.

Page 17

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

17 of 27

Missing data method

Due to the irregular spatial sampling of data in the AHBA some brain regions may not be assigned

any corresponding microarray expression data. Researchers can opt to simply omit these regions from

subsequent analyses; however, in some cases, this is not desirable as the spatial distribution of the

missing samples may not be random and discarding them may bias resulting estimates. Two options

for handling missing data have been proposed in the literature, including filling missing regions with

expression data from nearby regions (i.e. nearest- neighbors interpolation; Whitaker et�al., 2016), or

interpolating data in missing regions based on nearby samples (i.e. linear interpolation; Burt et�al.,

2018). In the current manuscript, we tested two options: (1) omit brain regions with missing data

entirely from subsequent analyses, and (2) fill missing data with expression values using nearest-

neighbors interpolation. Linear interpolation has been sparingly used in the published literature

(e.g. Burt et�al., 2018; Romero- Garcia et�al., 2018) and carries an increase in computational cost

(approximately an order of magnitude higher than nearest neighbors interpolation); as such, we do

not consider it in the current manuscript.

Sample-to-region matching tolerance

Volumetric atlases

While most tissue samples from the AHBA will fall directly within the brain regions delineated by

most parcellations, some samples may fall outside the boundaries of these regions. Researchers can

nonetheless choose to permit assigning these nearby samples to a given region, but will often set a

distance threshold beyond which samples cannot be assigned. In the current manuscript, we consid-

ered three distance tolerances: 0�mm (i.e. samples must fall exactly within a region), 1�mm, and 2�mm.

Surface atlases

Because tissue samples from the AHBA are defined in volumetric space, matching them to parcels

defined on a surface- based atlas requires different considerations than with volumetric atlases.

Notably, all samples will have non- zero distances from surface vertices; therefore, when matching

to surface atlases distance thresholds are generally considered in terms of standard deviations (Burt

et�al., 2018; Anderson et�al., 2020b). In this way, all samples are matched to the surface and then

those that are more than the specified standard deviation(s) above the mean away from the surface

are excluded. In the current manuscript we tested three standard deviation distance tolerances: 0�s.d.

(i.e. all samples farther than the average distance are excluded), 1�s.d., and 2�s.d.

Sample normalization method

Prior to aggregating microarray expression data across donors, researchers can optionally normalize

the microarray expression data for each tissue sample across all represented genes (i.e., perform row-

wise normalization). This procedure can account for between- sample differences in gene expression

potentially driven by measurement errors. There is a number of techniques that have been proposed

to normalize expression values; however, in the current manuscript, we considered three normaliza-

tion methods: (1) no normalization, (2) a z- score transform, and (3) a scaled robust sigmoid transform

(Fulcher et�al., 2013).

Gene normalization method

Prior to aggregating microarray expression data across donors, researchers can optionally normalize

the microarray expression data for each represented gene across tissue samples (i.e. perform column-

wise normalization). This procedure can account for inter- individual (donor- specific) differences in

gene expression data, which remain present in the AHBA despite batch corrections performed by the

Allen Institute prior to releasing the data. In the current manuscript, we considered three normaliza-

tion methods: (1) no normalization, (2) a z- score transform, and (3) a scaled robust sigmoid transform

(Fulcher et�al., 2013).

Normalizing only matched samples

Due to choices in other processing steps (e.g. Volume- or surface- based atlas, Sample- to- region

matching tolerance) some tissue samples from the AHBA may not be assigned to any region in a

given brain atlas. During gene normalization, where expression from each gene is normalized across

Page 18

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

18 of 27

tissue samples, researchers must decide whether to use (1) only those tissue samples matched to

brain regions, or (2) the entire corpus of tissue samples, irrespective of whether they will be included

in the final, processed regional expression matrix. In the current manuscript we consider both of these

options.

Normalizing discrete structures

There is known variation in gene expression values between tissue samples taken from distinct struc-

tural classes (i.e. samples taken from neocortex may have different expression values than those from

the brainstem). When performing gene normalization researchers can opt to normalize (1) across all

samples irrespective of the structure from which they derive or (2) independently for samples taken

from different brain structures. Although the brain atlas used in the current manuscript represents only

cortical parcels, this processing choice can interact with Normalizing only matched samples to impact

resulting expression values and we therefore test both options.

Note that in the abagen toolbox structural classes are operationalized as: (1) cortex, (2) subcortex

and brainstem, (3) cerebellum, and (4) white matter. Subcortex and brainstem are considered as one

class because neuroanatomical delineation between these regions are widely contested and expres-

sion values in these regions tend to be more similar to one another than to other regions (i.e. data-

driven clustering of samples tends to assign subcortical and brainstem samples together).

Sample-to-region combination method

Once tissue samples have been assigned to brain regions they need to be combined to generate a

single expression profile; however, due to sampling differences between donors, some donors may

have more tissue samples assigned to a given brain region than others. Thus, researchers must decide

whether to aggregate samples (1) within each brain region independently for each donor and then

across donors, or (2) simultaneously across all donors. In the latter case, donors with a higher number

of samples matched to a region will contribute more to the expression profile of a given region (Arnat-

keviciute et�al., 2019). In the current manuscript, we test both of these options.

Sample-to-region combination metric

When aggregating tissue samples into brain regions researchers must decide what aggregation metric

they want to use. Although any statistical estimate could be considered, in practice an estimate of

central tendency such as the mean expression values across tissue samples is most applicable. In the

current manuscript, we test aggregation with both the (1) mean and (2) median.

Analytic approaches

Prototypical analyses relying on parcellated microarray expression data from the AHBA fall into three

broad categories (Fornito et�al., 2019):

1. Correlated gene expression: Examining the correlation between distinct brain regions across

genes (i.e. using the region- by- region correlation matrix);

2. Gene co- expression: Examining the correlation between gene expression profiles across brain

regions (i.e. using the gene- by- gene correlation matrix); or,

3. Regional gene expression: Examining the expression profile of one (or more) genes across brain

regions (i.e. using selected columns of the region- by- gene expression matrix).

In order to examine the interaction between processing options and analytic method, we performed

one analysis from each of these three categories, described below, for every output of the 746,496

processing pipelines.

Correlated gene expression

Researchers have reliably found a relationship between correlated gene expression in the brain and

the distance between brain regions: that is, brain regions that are farther away from one another tend

to have less similar gene expression profiles (Richiardi et�al., 2015; Richiardi et�al., 2017; Krienen

et�al., 2016; V�rtes et�al., 2016; Arnatkeviciute et�al., 2019). In order to examine the impact of

processing choices on this relationship, we computed the Spearman correlation between the upper

triangle of the regional distance matrix (Euclidean distance between brain regions) and the upper

triangle of each correlated gene expression matrix (Figure�1a, left). Brain regions for which no gene

Page 19

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

19 of 27

expression data were available (dependent on pipeline options) were not included in the correlation.

Note that this relationship is likely exponential (Arnatkeviciute et�al., 2019); however, we calculated

the Spearman coefficient as it is more computationally tractable and it should exhibit similar variability

across pipelines.

Gene co-expression

Researchers have previously shown that gene expression in the brain tends to organize into function-

ally defined communities or modules (Oldham et�al., 2008; Hawrylycz et�al., 2012). We examined

the extent to which functional gene modules derived from a separate transcriptomic dataset (Oldham

et�al., 2008) mapped onto the gene co- expression matrices generated from the different processing

pipelines. For each gene- by- gene matrix, we calculated the silhouette score (Rousseeuw, 1987) of

the gene modules on a modified version of gene co- expression matrix (calculating Euclidean distance

between genes instead of gene correlations; Figure�1a, middle) via:

s = 1

N ∑N

i=1

b(i)−a(i)

max{a(i),b(i)}

where a(i) is the average distance of a data point to all other data points in the same cluster, b(i) is

the mean distance of data point to the nearest neighboring cluster, and N is the total number of data

points. The final silhouette score s ranges from –1 to�+1, where positive values indicate assortative and

negative values indicate disassortative clusters.

Note that the original gene modules were defined using a weighted gene co- expression network

analysis (WGCNA), which generally requires performing additional processing steps on the gene

co- expression matrix. Since we used the raw gene co- expression matrix in the current analysis, we

expect lower silhouette scores than those reported in the initial manuscript where the gene commu-

nities were initially defined; however, the variance in scores between pipelines should not be signifi-

cantly impacted by this choice.

Regional gene expression

Researchers recently highlighted how the principal component of gene expression in the brain closely

mirrors the spatial variation observed in MRI- derived T1w/T2w measurements (typically used as a

proxy for myelination; Burt et�al., 2018). We examined whether this relationship was present across

the outputs of the different pipelines, measuring the Spearman correlation between the T1w/T2w ratio

and the first principal component of the regional gene expression matrix (Figure�1a, right). Regional

gene expression matrices were mean- centered prior to extraction of the principal component.

Assessing pipeline impact

In order to examine the impact of each processing option on the resulting analyses, we calculated

a difference score, measuring the extent to which changing each option—holding all other options

constant—influenced the derived metrics (i.e. correlation, silhouette score). When there were only two

choices for a given option the impact was calculated as the absolute value of the difference between

the two choices. When there were more than two choices and choices were ordinal (e.g. sample- to-

region matching tolerance) the impact was calculated as the average of the absolute value of the

difference between adjacent choices. When there were more than two choices and the choices were

categorical (e.g. probe selection method) the impact was calculated as the average of the absolute

value of the difference between all combinations of choices. These calculations yielded a distribution

of ‘impact’ estimates (i.e. change scores) for each processing option; we represented the final impact

score for each processing option as the average of these distributions, taken independently for each

of the three analyses. Impact estimates were rank- ordered (where the most impactful parameter was

given a rank of one, the second most impactful a rank of two, and so on) to enable direct comparison

across the different statistical estimates derived from the three analyses.

Pipeline dimensionality reduction

To investigate qualitative differences between the processing pipelines we performed a principal

components analysis (PCA) on the matrix of estimates from the three statistical analyses (i.e. the

746,496 � 3 matrix). We mean- centered the columns of the matrix and extracted the first two principal

Page 20

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

20 of 27

components, examining how pipeline scores were distributed along these two components in relation

to different processing options. These principal component highlight the closeness of the estimate

generated by each pipeline along the dimensions of maximum statistical variation; that is, two pipe-

lines that are closer together in the reduced- dimension space yielded more similar statistical estimates

than two pipelines that are farther apart.

Reproducing pipelines from the literature

Although all the processing options explored in the current manuscript are reasonable or viable

choices that researchers could make when preparing the AHBA for analysis, in reality these have not

all been used in the published literature. In order to examine how pipelines used in the literature

compared to those that we assessed, we selected nine articles that relied on data from the AHBA to

support a primary research finding and reproduced their processing pipelines in abagen (Hawrylycz

et�al., 2015; French and Paus, 2015; Whitaker et�al., 2016; Krienen et�al., 2016; Anderson et�al.,

2018; Burt et�al., 2018; Romero- Garcia et�al., 2018; Anderson et�al., 2020b; Liu et�al., 2020). Note

that these articles used a variety of parcellations and so to ensure comparability across pipelines we

standardized this parameter, using the Desikan- Killiany atlas in all instances. One parameter that we

did not assess in the pipelines explored in the current manuscript—whether to use individualized,

donor- specific parcellations or a group- level atlas—was frequently varied in the published pipelines.

Thus, when reproducing pipelines that called for individualized volumetric atlases we relied on the

donor- specific Desikan- Killiany parcellations provided by Arnatkeviciute et�al., 2019; when repro-

ducing pipelines with individualized surface atlases we relied on the donor- specific Desikan- Killiany

parcellations provided by Romero- Garcia et�al., 2018.

As not all of the original manuscripts detailed the processing choices for each of the 17 steps in

the abagen workflow, when specific parameter choices were omitted we either: (1) used the default

setting if the parameter was required (e.g. using the mean for the ‘sample- to- region combination

metric’, since all pipelines must combine samples to regions), or (2) omitted the processing step

entirely if it is an optional step (e.g. not performing any gene normalization).

Acknowledgements

We thank Vincent Bazinet, Elizabeth DuPre, Justine Hansen, Golia Shafiei, Laura Su�rez, and Bertha

V�zquez- Rodr�guez for their comments and suggestions. This research was undertaken thanks in part

to funding from the Canada First Research Excellence Fund, awarded to McGill University for the

Healthy Brains for Healthy Lives initiative. This work was supported in part by funding provided by

Brain Canada, in partnership with Health Canada, for the Canadian Open Neuroscience Platform

initiative. RDM acknowledges support from the Fonds du Recherche Qu�bec - Nature et Technol-

ogies and the Canadian Open Neuroscience Platform. BM acknowledges support from the Natural

Sciences and Engineering Research Council of Canada (NSERC Discovery Grant RGPIN #017–04265)

and from the Canada Research Chairs Program. AF was supported by the Sylvia and Charles Viertel

Foundation and National Health and Medical Research Council (ID: 3274306). J- BP was partially

funded by National Institutes of Health (NIH) NIH- NIBIB P41 EB019936 (ReproNim) NIH- NIMH R01

MH083320 (CANDIShare) and NIH RF1 MH120021 (NIDM), the National Institute Of Mental Health of

the NIH under Award Number R01MH096906 (Neurosynth), and by Natural Sciences and Engineering

Research Council of Canada (NSERC).

Additional information

Funding

Funder

Grant reference number Author

Natural Sciences and

Engineering Research

Council of Canada

017-04265

Bratislav Misic

Page 21

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

21 of 27

Funder

Grant reference number Author

National Health and

Medical Research Council

3274306

Alex Fornito

National Institutes of

Health

NIH-NIBIB P41 EB019936

Jean-Baptiste Poline

The funders had no role in study design, data collection and interpretation, or the

decision to submit the work for publication.

Author contributions

Ross D Markello, Conceptualization, Formal analysis, Investigation, Methodology, Resources, Soft-

ware, Validation, Visualization, Writing – original draft, Writing – review and editing; Aurina Arnat-

keviciute, Conceptualization, Data curation, Software, Writing – review and editing; Jean- Baptiste

Poline, Alex Fornito, Conceptualization, Writing – review and editing; Ben D Fulcher, Conceptualiza-

tion, Software, Writing – review and editing; Bratislav Misic, Conceptualization, Project administration,

Supervision, Visualization, Writing – original draft, Writing – review and editing

Author ORCIDs

Ross D Markello http:// orcid. org/ 0000- 0003- 1057- 1336

Jean- Baptiste Poline http:// orcid. org/ 0000- 0002- 9794- 749X

Ben D Fulcher http:// orcid. org/ 0000- 0002- 3003- 4055

Bratislav Misic http:// orcid. org/ 0000- 0003- 0307- 2862

Decision letter and Author response

Decision letter https:// doi. org/ 10. 7554/ eLife. 72129. sa1

Author response https:// doi. org/ 10. 7554/ eLife. 72129. sa2

Additional files

Supplementary files

• Transparent reporting form

• Supplementary file 1. Default abagen pipeline options. The default settings for the 17 processing

steps considered when processing the AHBA data with abagen. An entry of ‘—' indicates that

this is a required, user- supplied parameter. A blank entry indicates that the processing step is not

implemented by default. Refer to Table�1 and Methods: Gene expression pipelines for further

details.

Data availability

All datasets used in this study are publicly available. Detailed information about the datasets and how

to access them are described in the manuscript.

References

Allen Institute for Brain Science. 2013. Allen Human Brain Atlas online documentation. Allen Institute

Publications for Brain Science. https:// help. brain- map. org/ display/ humanbrain/ Documentation

Anderson KM, Krienen FM, Choi EY, Reinen JM, Yeo BTT, Holmes AJ. 2018. Gene expression links functional

networks across cortex and striatum. Nature Communications 9:1428. DOI: https:// doi. org/ 10. 1038/ s41467-

018- 03811- x, PMID: 29651138

Anderson KM, Collins MA, Chin R, Ge T, Rosenberg MD, Holmes AJ. 2020a. Transcriptional and imaging- genetic

association of cortical interneurons, brain function, and schizophrenia risk. Nature Communications 11:2889.

DOI: https:// doi. org/ 10. 1038/ s41467- 020- 16710- x, PMID: 32514083

Anderson KM, Collins MA, Kong R, Fang K, Li J, He T, Chekroud AM, Yeo BTT, Holmes AJ. 2020b. Convergent

molecular, cellular, and cortical neuroimaging signatures of major depressive disorder. PNAS 117:25138–25149.

DOI: https:// doi. org/ 10. 1073/ pnas. 2008004117, PMID: 32958675

Arnatkeviciute A, Fulcher BD, Fornito A. 2019. A practical guide to linking brain- wide gene expression and

neuroimaging data. NeuroImage 189:353–367. DOI: https:// doi. org/ 10. 1016/ j. neuroimage. 2019. 01. 011, PMID:

30648605

Arnatkevičiūtė A, Fulcher B, Oldham S, Tiego J, Paquola C, Gerring Z, Aquino K, Hawi Z, Johnson B, Ball G,

Klein M, Deco G, Franke B, Bellgrove M, Fornito A. 2020. Genetic Influences on Hub Connectivity of the

Human Connectome. [bioRxiv]. DOI: https:// doi. org/ 10. 1101/ 2020. 06. 21. 163915

Page 22

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

22 of 27

Arnatkevičiūtė A, Fulcher B, Bellgrove M, Fornito A. 2021. Where the Genome Meets the Connectome:

Understanding How Genes Shape Human Brain Connectivity. [PsyArXiv]. DOI: https:// doi. org/ 10. 31234/ osf. io/

hqgz7

Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC. 2011. A reproducible evaluation of ANTs similarity

metric performance in brain image registration. NeuroImage 54:2033–2044. DOI: https:// doi. org/ 10. 1016/ j.

neuroimage. 2010. 09. 025, PMID: 20851191

Beliveau V, Ganz M, Feng L, Ozenne B, H�jgaard L, Fisher PM, Svarer C, Greve DN, Knudsen GM. 2017. A

High- Resolution In Vivo Atlas of the Human Brain’s Serotonin System. The Journal of Neuroscience 37:120–

128. DOI: https:// doi. org/ 10. 1523/ JNEUROSCI. 2830- 16. 2016, PMID: 28053035

Benkarim O, Paquola C, Park B -y, Hong SJ, Royer J, de Wael R, Larivi�re S, Valk S, Bzdok D, Mottron L. 2020.

Functional Idiosyncrasy Has a Shared Topography with Group- Level Connectivity Alterations in Autism.

[bioRxiv]. DOI: https:// doi. org/ 10. 1101/ 2020. 12. 18. 423291

Betzel RF, Bassett DS. 2018. Specificity and robustness of long- distance connections in weighted, interareal

connectomes. PNAS 115:E4880–E4889. DOI: https:// doi. org/ 10. 1073/ pnas. 1720186115, PMID: 29739890

Bhagwat N, Barry A, Dickie EW, Brown ST, Devenyi GA, Hatano K, DuPre E, Dagher A, Chakravarty M,

Greenwood CMT, Misic B, Kennedy DN, Poline J- B. 2021. Understanding the impact of preprocessing pipelines

on neuroimaging cortical surface analyses. GigaScience 10:giaa155. DOI: https:// doi. org/ 10. 1093/ gigascience/

giaa155, PMID: 33481004

Botvinik- Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, Kirchler M, Iwanir R,

Mumford JA, Adcock RA, Avesani P, Baczkowski BM, Bajracharya A, Bakst L, Ball S, Barilari M, Bault N,

Beaton D, Beitner J, Benoit RG, et�al. 2020. Variability in the analysis of a single neuroimaging dataset by many

teams. Nature 582:84–88. DOI: https:// doi. org/ 10. 1038/ s41586- 020- 2314- 9, PMID: 32483374

Brett M, Markiewicz CJ, Hanke M, C�t� MA, Cipollini B, McCarthy P, Cheng CP, Halchenko YO, Cottaar M,

Ghosh S. 2019. Nipy/Nibabel [Zenodo]. DOI: https:// doi. org/ 10. 5281/ zenodo. 591597

Brown JA, Lee AJ, Pasquini L, Seeley WW. 2021. A Dynamic Gradient Architecture Generates Brain Activity

States. [bioRxiv]. DOI: https:// doi. org/ 10. 1101/ 2020. 08. 12. 248112

Burt JB, Demirtaş M, Eckner WJ, Navejar NM, Ji JL, Martin WJ, Bernacchia A, Anticevic A, Murray JD, . 2018.

Hierarchy of transcriptomic specialization across human cortex captured by structural neuroimaging

topography. Nature Neuroscience 21:1251–1259. DOI: https:// doi. org/ 10. 1038/ s41593- 018- 0195- 0, PMID:

30082915

Carp J. 2012. On the plurality of (methodological) worlds: estimating the analytic flexibility of FMRI experiments.

Frontiers in Neuroscience 6:149. DOI: https:// doi. org/ 10. 3389/ fnins. 2012. 00149, PMID: 23087605

Cieslak M, Cook PA, He X, Yeh FC, Dhollander T, Adebimpe A, Aguirre GK, Bassett DS, Betzel RF, Bourque J.

2020. QSIPrep: An Integrative Platform for Preprocessing and Reconstructing Diffusion MRI. [bioRxiv]. DOI:

https:// doi. org/ 10. 1101/ 2020. 09. 04. 282269

Ciric R, Wolf DH, Power JD, Roalf DR, Baum GL, Ruparel K, Shinohara RT, Elliott MA, Eickhoff SB, Davatzikos C,

Gur RC, Gur RE, Bassett DS, Satterthwaite TD. 2017. Benchmarking of participant- level confound regression

strategies for the control of motion artifact in studies of functional connectivity. NeuroImage 154:174–187.

DOI: https:// doi. org/ 10. 1016/ j. neuroimage. 2017. 03. 020, PMID: 28302591

Collins DL, Zijdenbos AP, Baar� WF, Evans AC. 1999. ANIMAL+INSECT: improved cortical structure

segmentation. DBLP. . DOI: https:// doi. org/ 10. 1007/ 3- 540- 48714- X_ 16

Craddock RC, James GA, Holtzheimer PE, Hu XP, Mayberg HS. 2012. A whole brain fMRI atlas generated via

spatially constrained spectral clustering. Human Brain Mapping 33:1914–1928. DOI: https:// doi. org/ 10. 1002/

hbm. 21333, PMID: 21769991

Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Hayden Gephart MG, Barres BA, Quake SR. 2015.

A survey of human brain transcriptome diversity at the single cell level. PNAS 112:7285–7290. DOI: https:// doi.

org/ 10. 1073/ pnas. 1507125112, PMID: 26060301

Deco G, Aquino KM, Arnatkeviciute A, Oldham S, Sabaroedin K, Rogasch NC, Kringelbach ML, Fornito A. 2020.

Dynamical Consequences of Regional Heterogeneity in the Brains Transcriptional Landscape. [bioRxiv]. DOI:

https:// doi. org/ 10. 1101/ 2020. 10. 28. 359943

Demirtaş M, Burt JB, Helmer M, Ji JL, Adkinson BD, Glasser MF, Van Essen DC, Sotiropoulos SN, Anticevic A,

Murray JD. 2019. Hierarchical Heterogeneity across Human Cortex Shapes Large- Scale Neural Dynamics.

Neuron 101:1181–1194. DOI: https:// doi. org/ 10. 1016/ j. neuron. 2019. 01. 017, PMID: 30744986

Desikan RS, S�gonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL, Dale AM, Maguire RP,

Hyman BT, Albert MS, Killiany RJ. 2006. An automated labeling system for subdividing the human cerebral

cortex on MRI scans into gyral based regions of interest. NeuroImage 31:968–980. DOI: https:// doi. org/ 10.

1016/ j. neuroimage. 2006. 01. 021, PMID: 16530430

Dickie EW, Ameis SH, Shahab S, Calarco N, Smith DE, Miranda D, Viviano JD, Voineskos AN. 2018. Personalized

Intrinsic Network Topography Mapping and Functional Connectivity Deficits in Autism Spectrum Disorder.

Biological Psychiatry 84:278–286. DOI: https:// doi. org/ 10. 1016/ j. biopsych. 2018. 02. 1174, PMID: 29703592

Ding Y, Zhao K, Che T, Du K, Sun H, Liu S, Zheng Y, Li S, Liu B, Liu Y, Alzheimer’s Disease Neuroimaging Initiative.

2021. Quantitative Radiomic Features as New Biomarkers for Alzheimer’s Disease: An Amyloid PET Study.

Cerebral Cortex 31:3950–3961. DOI: https:// doi. org/ 10. 1093/ cercor/ bhab061, PMID: 33884402

Dragicevic P, Jansen Y, Sarma A, Kay M, Chevalier F. 2019. Increasing the Transparency of Research Papers with

Explorable Multiverse Analyses. The 2019 CHI Conference. . DOI: https:// doi. org/ 10. 1145/ 3290605. 3300295

Page 23

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

23 of 27

Esteban O, Markiewicz CJ, Blair RW, Moodie CA, Isik AI, Erramuzpe A, Kent JD, Goncalves M, DuPre E,

Snyder M, Oya H, Ghosh SS, Wright J, Durnez J, Poldrack RA, Gorgolewski KJ. 2019. fMRIPrep: a robust

preprocessing pipeline for functional MRI. Nature Methods 16:111–116. DOI: https:// doi. org/ 10. 1038/

s41592- 018- 0235- 4, PMID: 30532080

Fonov VS, Evans AC, McKinstry RC, Almli C, Collins D. 2009. Unbiased nonlinear average age- appropriate brain

templates from birth to adulthood. NeuroImage 47:S102. DOI: https:// doi. org/ 10. 1016/ S1053- 8119( 09)

70884-5

Fonov V, Evans AC, Botteron K, Almli CR, McKinstry RC, Collins DL, Group BDC. 2011. Unbiased average

age- appropriate atlases for pediatric studies. NeuroImage 54:313–327. DOI: https:// doi. org/ 10. 1016/ j.

neuroimage. 2010. 07. 033, PMID: 20656036

Fornito A, Arnatkevičiūtė A, Fulcher BD. 2019. Bridging the Gap between Connectome and Transcriptome.

Trends in Cognitive Sciences 23:34–50. DOI: https:// doi. org/ 10. 1016/ j. tics. 2018. 10. 005, PMID: 30455082

Fox AS, Chang LJ, Gorgolewski KJ, Yarkoni T. 2014. Bridging Psychology and Genetics Using Large- Scale Spatial

Analysis of Neuroimaging and Neurogenetic Data. [bioRxiv]. DOI: https:// doi. org/ 10. 1101/ 012310

French L, Paus T. 2015. A FreeSurfer view of the cortical transcriptome generated from the Allen Human Brain

Atlas. Frontiers in Neuroscience 9:323. DOI: https:// doi. org/ 10. 3389/ fnins. 2015. 00323, PMID: 26441498

Fulcher BD, Little MA, Jones NS. 2013. Highly comparative time- series analysis: the empirical structure of time

series and their methods. Journal of the Royal Society, Interface 10:20130048. DOI: https:// doi. org/ 10. 1098/

rsif. 2013. 0048, PMID: 23554344

Fulcher BD. 2019. Discovering Conserved Properties of Brain Organization Through Multimodal Integration and

Interspecies Comparison. Journal of Experimental Neuroscience 13:1179069519862047. DOI: https:// doi. org/

10. 1177/ 1179069519862047, PMID: 31312085

Fulcher BD, Murray JD, Zerbi V, Wang XJ. 2019. Multimodal gradients across mouse cortex. PNAS 116:4689–

4695. DOI: https:// doi. org/ 10. 1073/ pnas. 1814144116, PMID: 30782826

Gandal MJ, Zhang P, Hadjimichael E, Walker RL, Chen C, Liu S, Won H, van Bakel H, Varghese M, Wang Y,

Shieh AW, Haney J, Parhami S, Belmont J, Kim M, Moran Losada P, Khan Z, Mleczko J, Xia Y, Dai R, et�al. 2018.

Transcriptome- wide isoform- level dysregulation in ASD, schizophrenia, and bipolar disorder. Science

362:eaat8127. DOI: https:// doi. org/ 10. 1126/ science. aat8127, PMID: 30545856

Gao R, van den Brink RL, Pfeffer T, Voytek B. 2020. Neuronal timescales are functionally dynamic and shaped by

cortical microarchitecture. eLife 9:e61277. DOI: https:// doi. org/ 10. 7554/ eLife. 61277, PMID: 33226336

Gordon EM, Laumann TO, Gilmore AW, Newbold DJ, Greene DJ, Berg JJ, Ortega M, Hoyt- Drazen C, Gratton C,

Sun H, Hampton JM, Coalson RS, Nguyen AL, McDermott KB, Shimony JS, Snyder AZ, Schlaggar BL,

Petersen SE, Nelson SM, Dosenbach NUF. 2017. Precision Functional Mapping of Individual Human Brains.

Neuron 95:791–807. DOI: https:// doi. org/ 10. 1016/ j. neuron. 2017. 07. 011, PMID: 28757305

Gorgolewski KJ, Fox AS, Chang L, Sch�fer A, Ar�lin K, Burmann I, Sacher J, Margulies DS. 2014. Tight fitting

genes: finding relations between statistical maps and gene expression patterns. F1000Research 5:1. DOI:

https:// doi. org/ 10. 7490/ F1000RESEARCH. 1097120.1

Gorgolewski KJ, Varoquaux G, Rivera G, Schwarz Y, Ghosh SS, Maumet C, Sochat VV, Nichols TE, Poldrack RA,

Poline J- B, Yarkoni T, Margulies DS. 2015. NeuroVault. org: a web- based repository for collecting and sharing

unthresholded statistical maps of the human brain. Frontiers in Neuroinformatics 9:8. DOI: https:// doi. org/ 10.

3389/ fninf. 2015. 00008, PMID: 25914639

Goulas A, Betzel RF, Hilgetag CC. 2019. Spatiotemporal ontogeny of brain wiring. Science Advances

5:eaav9694. DOI: https:// doi. org/ 10. 1126/ sciadv. aav9694, PMID: 31206020

Hansen JY, Markello RD, Vogel JW, Seidlitz J, Bzdok D, Misic B. 2021. Mapping gene transcription and

neurocognition across human neocortex. Nature Human Behaviour 5:1240–1250. DOI: https:// doi. org/ 10. 1038/

s41562- 021- 01082-z

Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S,

Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, Del R�o JF, Wiebe M, Peterson P,

G�rard- Marchant P, et�al. 2020. Array programming with NumPy. Nature 585:357–362. DOI: https:// doi. org/ 10.

1038/ s41586- 020- 2649- 2, PMID: 32939066

Hawrylycz MJ, Lein ES, Guillozet- Bongaarts AL, Shen EH, Ng L, Miller JA, van de Lagemaat LN, Smith KA,

Ebbert A, Riley ZL, Abajian C, Beckmann CF, Bernard A, Bertagnolli D, Boe AF, Cartagena PM,

Chakravarty MM, Chapin M, Chong J, Dalley RA, et�al. 2012. An anatomically comprehensive atlas of the adult

human brain transcriptome. Nature 489:391–399. DOI: https:// doi. org/ 10. 1038/ nature11405, PMID: 22996553

Hawrylycz M, Miller JA, Menon V, Feng D, Dolbeare T, Guillozet- Bongaarts AL, Jegga AG, Aronow BJ, Lee C- K,

Bernard A, Glasser MF, Dierker DL, Menche J, Szafer A, Collman F, Grange P, Berman KA, Mihalas S, Yao Z,

Stewart L, et�al. 2015. Canonical genetic signatures of the adult human brain. Nature Neuroscience 18:1832–

1844. DOI: https:// doi. org/ 10. 1038/ nn. 4171

Henderson MX, Cornblath EJ, Darwich A, Zhang B, Brown H, Gathagan RJ, Sandler RM, Bassett DS,

Trojanowski JQ, Lee VMY. 2019. Spread of α-synuclein pathology through the brain connectome is modulated

by selective vulnerability and predicted by network analysis. Nature Neuroscience 22:1248–1257. DOI: https://

doi. org/ 10. 1038/ s41593- 019- 0457- 5, PMID: 31346295

Horv�t S, Gămănuţ R, Ercsey- Ravasz M, Magrou L, Gămănuţ B, Van Essen DC, Burkhalter A, Knoblauch K,

Toroczkai Z, Kennedy H. 2016. Spatial Embedding and Wiring Cost Constrain the Functional Layout of the

Cortical Network of Rodents and Primates. PLOS Biology 14:e1002512. DOI: https:// doi. org/ 10. 1371/ journal.

pbio. 1002512, PMID: 27441598

Page 24

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

24 of 27

Hunter JD. 2007. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 9:90–95. DOI:

https:// doi. org/ 10. 1109/ MCSE. 2007. 55

Kang HJ, Kawasawa YI, Cheng F, Zhu Y, Xu X, Li M, Sousa AMM, Pletikos M, Meyer KA, Sedmak G, Guennel T,

Shin Y, Johnson MB, Krsnik Z, Mayer S, Fertuzinhos S, Umlauf S, Lisgo SN, Vortmeyer A, Weinberger DR, et�al.

2011. Spatio- temporal transcriptome of the human brain. Nature 478:483–489. DOI: https:// doi. org/ 10. 1038/

nature10523, PMID: 22031440

Kharabian Masouleh S, Eickhoff SB, Zeighami Y, Lewis LB, Dahnke R, Gaser C, Chouinard- Decorte F, Lepage C,

Scholtens LH, Hoffstaedter F, Glahn DC, Blangero J, Evans AC, Genon S, Valk SL. 2020. Influence of Processing

Pipeline on Cortical Thickness Measurement. Cerebral Cortex 30:5014–5027. DOI: https:// doi. org/ 10. 1093/

cercor/ bhaa097, PMID: 32377664

Kirsch L, Chechik G. 2016. On Expression Patterns and Developmental Origin of Human Brain Regions. PLOS

Computational Biology 12:e1005064. DOI: https:// doi. org/ 10. 1371/ journal. pcbi. 1005064, PMID: 27564987

Kluyver T, Ragan- Kelley B, P�rez F, Granger BE, Bussonnier M, Frederic J, Kelley K, Hamrick JB, Grout J,

Corlay S. 2016. Jupyter Notebooks–A publishing format for reproducible computational workflows. Loizides F,

Scmidt B (Eds). Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS Press. p.

1–164.

Kong R, Li J, Orban C, Sabuncu MR, Liu H, Schaefer A, Sun N, Zuo X- N, Holmes AJ, Eickhoff SB, Yeo BTT. 2019.

Spatial Topography of Individual- Specific Cortical Networks Predicts Human Cognition, Personality, and

Emotion. Cerebral Cortex 29:2533–2551. DOI: https:// doi. org/ 10. 1093/ cercor/ bhy123, PMID: 29878084

Krienen FM, Yeo BTT, Ge T, Buckner RL, Sherwood CC. 2016. Transcriptional profiles of supragranular- enriched

genes associate with corticocortical network architecture in the human brain. PNAS 113:E469-E478. DOI:

https:// doi. org/ 10. 1073/ pnas. 1510903113, PMID: 26739559

Lake BB, Ai R, Kaeser GE, Salathia NS, Yung YC, Liu R, Wildberg A, Gao D, Fung H- L, Chen S, Vijayaraghavan R,

Wong J, Chen A, Sheng X, Kaper F, Shen R, Ronaghi M, Fan J- B, Wang W, Chun J, et�al. 2016. Neuronal

subtypes and diversity revealed by single- nucleus RNA sequencing of the human brain. Science 352:1586–

1590. DOI: https:// doi. org/ 10. 1126/ science. aaf1204, PMID: 27339989

Lariviere S, Paquola C, Park B -y, Royer J, Wang Y, Benkarim O, de Wael R, Valk SL, Thomopoulos S, Kirschner M.

2020. The ENIGMA Toolbox: Cross- Disorder Integration and Multiscale Neural Contextualization of Multisite

Neuroimaging Datasets. [bioRxiv]. DOI: https:// doi. org/ 10. 1101/ 2020. 12. 21. 423838

Lau HYG, Fornito A, Fulcher BD. 2021. Scaling of gene transcriptional gradients with brain size across mouse

development. NeuroImage 224:117395. DOI: https:// doi. org/ 10. 1016/ j. neuroimage. 2020. 117395, PMID:

32979525

Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, Boe AF, Boguski MS, Brockway KS, Byrnes EJ,

Chen L, Chen L, Chen T- M, Chin MC, Chong J, Crook BE, Czaplinska A, Dang CN, Datta S, Dee NR, et�al. 2007.

Genome- wide atlas of gene expression in the adult mouse brain. Nature 445:168–176. DOI: https:// doi. org/ 10.

1038/ nature05453, PMID: 17151600

Li M, Santpere G, Imamura Kawasawa Y, Evgrafov OV, Gulden FO, Pochareddy S, Sunkin SM, Li Z, Shin Y, Zhu Y,

Sousa AMM, Werling DM, Kitchen RR, Kang HJ, Pletikos M, Choi J, Muchnik S, Xu X, Wang D,

Lorente- Galdos B, et�al. 2018. Integrative functional genomic analysis of human brain development and

neuropsychiatric risks. Science 362:eaat7615. DOI: https:// doi. org/ 10. 1126/ science. aat7615, PMID: 30545854

Liu J, Xia M, Wang X, Liao X, He Y. 2020. The spatial organization of the chronnectome associates with cortical

hierarchy and transcriptional profiles in the human brain. NeuroImage 222:117296. DOI: https:// doi. org/ 10.

1016/ j. neuroimage. 2020. 117296, PMID: 32828922

Maier- Hein KH, Neher PF, Houde JC, C�t� MA, Garyfallidis E, Zhong J, Chamberland M, Yeh FC, Lin YC, Ji Q,

Reddick WE, Glass JO, Chen DQ, Feng Y, Gao C, Wu Y, Ma J, He R, Li Q, Westin CF, et�al. 2017. The challenge

of mapping the human connectome based on diffusion tractography. Nature Communications 8:1349. DOI:

https:// doi. org/ 10. 1038/ s41467- 017- 01285- x, PMID: 29116093

Markello RD, Misic B. 2021. Comparing spatial null models for brain maps. NeuroImage 236:118052. DOI:

https:// doi. org/ 10. 1016/ j. neuroimage. 2021. 118052, PMID: 33857618

Markello R. 2021a. markello_transcriptome. swh:1:rev:3abbc85596a5baacd93e5e9e56c906c9dbb080f3.

Software Heritage. https:// archive. softwareheritage. org/ swh: 1: dir: ed4b 1a9e 5eb2 449f 1d9f 5bb6 5c51 477a

a8c350dc; origin= https:// github. com/ netneurolab/ markello_ transcriptome; visit= swh: 1: snp: 4f5e eca5 d011 970f

4374 59b4 6fbf 885a c1554644; anchor= swh: 1: rev: 3abb c855 96a5 baac d93e 5e9e 56c9 06c9 dbb080f3

Markello R. 2021b. abagen. swh:1:rev:2aeab5bd0f147fa76b488645e148a1c18095378d. Software Heritage.

https:// archive. softwareheritage. org/ swh: 1: dir: 24ed 1ac6 001e 8767 42bf 4c83 1790 2313 926be07c; origin= https://

github. com/ rmarkello/ abagen; visit= swh: 1: snp: 7d53 4f07 cc7c 0a54 9243 db17 dc6d e7d2 ede98383; anchor= swh: 1:

rev: 2aea b5bd 0f14 7fa7 6b48 8645 e148 a1c1 8095378d

Markello R, Shafiei G, Zheng YQ, Mišić B. 2021c. Rmarkello/Abagen [Zenodo]. DOI: https:// doi. org/ 10. 5281/

zenodo. 3451463

Martins D, Dipasquale O, Veronese M, Turkheimer FE, Loggia M, McMahon S, Williams SC. 2021. Transcriptional

and Cellular Signatures of Cortical Morphometric Similarity Remodelling in Chronic Pain. [bioRxiv]. DOI: https://

doi. org/ 10. 1101/ 2021. 03. 24. 436777

McColgan P, Gregory S, Seunarine KK, Razi A, Papoutsi M, Johnson E, Durr A, Roos RAC, Leavitt BR, Holmans P,

Scahill RI, Clark CA, Rees G, Tabrizi SJ, Track- On HD Investigators. 2018. Brain Regions Showing White Matter

Loss in Huntington’s Disease Are Enriched for Synaptic and Metabolic Genes. Biological Psychiatry 83:456–465.

DOI: https:// doi. org/ 10. 1016/ j. biopsych. 2017. 10. 019, PMID: 29174593

Page 25

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

25 of 27

McKinney W. 2010. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in

Science Conference. . DOI: https:// doi. org/ 10. 25080/ Majora- 92bf1922- 00a

Mess� A. 2020. Parcellation influence on the connectivity- based structure- function relationship in the human

brain. Human Brain Mapping 41:1167–1180. DOI: https:// doi. org/ 10. 1002/ hbm. 24866, PMID: 31746083

Miller JA, Ding S- L, Sunkin SM, Smith KA, Ng L, Szafer A, Ebbert A, Riley ZL, Royall JJ, Aiona K, Arnold JM,

Bennet C, Bertagnolli D, Brouner K, Butler S, Caldejon S, Carey A, Cuhaciyan C, Dalley RA, Dee N, et�al. 2014.

Transcriptional landscape of the prenatal human brain. Nature 508:199–206. DOI: https:// doi. org/ 10. 1038/

nature13185, PMID: 24695229

Mišić B, Fatima Z, Askren MK, Buschkuehl M, Churchill N, Cimprich B, Deldin PJ, Jaeggi S, Jung M, Korostil M,

Kross E, Krpan KM, Peltier S, Reuter- Lorenz PA, Strother SC, Jonides J, McIntosh AR, Berman MG. 2014. The

functional connectivity landscape of the human brain. PLOS ONE 9:e111007. DOI: https:// doi. org/ 10. 1371/

journal. pone. 0111007, PMID: 25350370

Morgan SE, Seidlitz J, Whitaker KJ, Romero- Garcia R, Clifton NE, Scarpazza C, van Amelsvoort T, Marcelis M,

van Os J, Donohoe G, Mothersill D, Corvin A, Pocklington A, Raznahan A, McGuire P, V�rtes PE, Bullmore ET.

2019. Cortical patterning of abnormal morphometric similarity in psychosis is associated with brain expression

of schizophrenia- related genes. PNAS 116:9604–9609. DOI: https:// doi. org/ 10. 1073/ pnas. 1820754116, PMID:

31004051

Negi SK, Guda C. 2017. Global gene expression profiling of healthy human brain and its application in studying

neurological disorders. Scientific Reports 7:897. DOI: https:// doi. org/ 10. 1038/ s41598- 017- 00952- 9, PMID:

28420888

N�rgaard M, Beliveau V, Ganz M, Svarer C, Pinborg LH, Keller SH, Jensen PS, Greve DN, Knudsen GM. 2021. A

high- resolution in vivo atlas of the human brain’s benzodiazepine binding site of GABAA receptors.

NeuroImage 232:117878. DOI: https:// doi. org/ 10. 1016/ j. neuroimage. 2021. 117878, PMID: 33610745

Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH. 2008. Functional

organization of the transcriptome in human brain. Nature Neuroscience 11:1271–1282. DOI: https:// doi. org/

10. 1038/ nn. 2207

Oldham S, Arnatkevic Iūtė A, Smith RE, Tiego J, Bellgrove MA, Fornito A. 2020. The efficacy of different

preprocessing steps in reducing motion- related confounds in diffusion MRI connectomics. NeuroImage

222:117252. DOI: https:// doi. org/ 10. 1016/ j. neuroimage. 2020. 117252, PMID: 32800991

Oliphant TE. 2006. A Guide to NumPy. Trelgol Publishing USA.

Park B -y, Park H, Morys F, Kim M, Byeon K, Lee H, Kim SH, Valk S, Dagher A, Bernhardt B. 2020. Body Mass

Variations Relate to Fractionated Functional Brain Hierarchies. [bioRxiv]. DOI: https:// doi. org/ 10. 1101/ 2020. 08.

07. 241794

Park BY, Bethlehem RA, Paquola C, Larivi�re S, Rodr�guez- Cruces R, Vos de Wael R, Neuroscience in Psychiatry

Network (NSPN) Consortium, Bullmore ET, Bernhardt BC. 2021. An expanding manifold in transmodal regions

characterizes adolescent reconfiguration of structural connectome organization. eLife 10:e64694. DOI: https://

doi. org/ 10. 7554/ eLife. 64694, PMID: 33787489

Parkes L, Fulcher BD, Y�cel M, Fornito A. 2017. Transcriptional signatures of connectomic subregions of the

human striatum. Genes, Brain, and Behavior 16:647–663. DOI: https:// doi. org/ 10. 1111/ gbb. 12386, PMID:

28421658

Parkes L, Fulcher B, Y�cel M, Fornito A. 2018. An evaluation of the efficacy, reliability, and sensitivity of motion

correction strategies for resting- state functional MRI. NeuroImage 171:415–436. DOI: https:// doi. org/ 10. 1016/

j. neuroimage. 2017. 12. 073, PMID: 29278773

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R,

Dubourg V. 2011. Scikit- learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–

2830.

Perez F, Granger BE. 2007. IPython: A System for Interactive Scientific Computing. Computing in Science &

Engineering 9:21–29. DOI: https:// doi. org/ 10. 1109/ MCSE. 2007. 53

Preller KH, Burt JB, Ji JL, Schleifer CH, Adkinson BD, St�mpfli P, Seifritz E, Repovs G, Krystal JH, Murray JD,

Vollenweider FX, Anticevic A. 2018. Changes in global and thalamic brain connectivity in LSD- induced altered

states of consciousness are attributable to the 5- HT2A receptor. eLife 7:e35082. DOI: https:// doi. org/ 10. 7554/

eLife. 35082, PMID: 30355445

Richiardi J, Altmann A, Milazzo AC, Chang C, Chakravarty MM, Banaschewski T, Barker GJ, Bokde ALW,

Bromberg U, B�chel C, Conrod P, Fauth- B�hler M, Flor H, Frouin V, Gallinat J, Garavan H, Gowland P, Heinz A,

Lema�tre H, Mann KF, et�al. 2015. BRAIN NETWORKS. Correlated gene expression supports synchronous

activity in brain networks. Science 348:1241–1244. DOI: https:// doi. org/ 10. 1126/ science. 1255905, PMID:

26068849

Richiardi J, Altmann A, Greicius M. 2017. Distance Is Not Everything in Imaging Genomics of Functional

Networks: Reply to a Commentary on Correlated Gene Expression Supports Synchronous Activity in Brain

Networks. [bioRxiv]. DOI: https:// doi. org/ 10. 1101/ 132746

Rittman T, Rubinov M, V�rtes PE, Patel AX, Ginestet CE, Ghosh BCP, Barker RA, Spillantini MG, Bullmore ET,

Rowe JB. 2016. Regional expression of the MAPT gene is associated with loss of hubs in brain networks and

cognitive impairment in Parkinson disease and progressive supranuclear palsy. Neurobiology of Aging

48:153–160. DOI: https:// doi. org/ 10. 1016/ j. neurobiolaging. 2016. 09. 001, PMID: 27697694

Rittman T, Rittman M, Azevedo T. 2017. Maybrain software package. RittmanResearch. https:// github. com/

RittmanResearch/ maybrain

Page 26

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

26 of 27

Rizzo G, Veronese M, Expert P, Turkheimer FE, Bertoldo A. 2016. MENGA: A New Comprehensive Tool for the

Integration of Neuroimaging Data and the Allen Human Brain Transcriptome Atlas. PLOS ONE 11:e0148744.

DOI: https:// doi. org/ 10. 1371/ journal. pone. 0148744, PMID: 26882227

Roberts JA, Perry A, Lord AR, Roberts G, Mitchell PB, Smith RE, Calamante F, Breakspear M. 2016. The

contribution of geometry to the human connectome. NeuroImage 124:379–393. DOI: https:// doi. org/ 10. 1016/

j. neuroimage. 2015. 09. 009, PMID: 26364864

Romero- Garcia R, Whitaker KJ, V�ša F, Seidlitz J, Shinn M, Fonagy P, Dolan RJ, Jones PB, Goodyer IM, NSPN

Consortium, Bullmore ET, V�rtes PE. 2018. Structural covariance networks are coupled to expression of genes

enriched in supragranular layers of the human cortex. NeuroImage 171:256–267. DOI: https:// doi. org/ 10. 1016/

j. neuroimage. 2017. 12. 060, PMID: 29274746

Romme IAC, de Reus MA, Ophoff RA, Kahn RS, van den Heuvel MP. 2017. Connectome Disconnectivity and

Cortical Gene Expression in Patients With Schizophrenia. Biological Psychiatry 81:495–502. DOI: https:// doi.

org/ 10. 1016/ j. biopsych. 2016. 07. 012, PMID: 27720199

Rousseeuw PJ. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal

of Computational and Applied Mathematics 20:53–65. DOI: https:// doi. org/ 10. 1016/ 0377- 0427( 87) 90125-7

Schilling KG, Nath V, Hansen C, Parvathaneni P, Blaber J, Gao Y, Neher P, Aydogan DB, Shi Y, Ocampo- Pineda M,

Schiavi S, Daducci A, Girard G, Barakovic M, Rafael- Patino J, Romascano D, Rensonnet G, Pizzolato M,

Bates A, Fischi E, et�al. 2019. Limits to anatomical accuracy of diffusion tractography using modern approaches.

NeuroImage 185:1–11. DOI: https:// doi. org/ 10. 1016/ j. neuroimage. 2018. 10. 029, PMID: 30317017

Seidlitz J, V�ša F, Shinn M, Romero- Garcia R, Whitaker KJ, V�rtes PE, Wagstyl K, Kirkpatrick Reardon P, Clasen L,

Liu S, Messinger A, Leopold DA, Fonagy P, Dolan RJ, Jones PB, Goodyer IM, NSPN Consortium, Raznahan A,

Bullmore ET. 2018. Morphometric Similarity Networks Detect Microscale Cortical Organization and Predict

Inter- Individual Cognitive Variation. Neuron 97:231-247.. DOI: https:// doi. org/ 10. 1016/ j. neuron. 2017. 11. 039,

PMID: 29276055

Seidlitz J, Nadig A, Liu S, Bethlehem RAI, V�rtes PE, Morgan SE, V�ša F, Romero- Garcia R, Lalonde FM,

Clasen LS, Blumenthal JD, Paquola C, Bernhardt B, Wagstyl K, Polioudakis D, de la Torre- Ubieta L,

Geschwind DH, Han JC, Lee NR, Murphy DG, et�al. 2020. Author Correction: Transcriptomic and cellular

decoding of regional brain vulnerability to neurogenetic disorders. Nature Communications 11:5936. DOI:

https:// doi. org/ 10. 1038/ s41467- 020- 19362- z, PMID: 33203864

Sepulcre J, Grothe MJ, d’Oleire Uquillas F, Ortiz- Ter�n L, Diez I, Yang H- S, Jacobs HIL, Hanseeuw BJ, Li Q,

El- Fakhri G, Sperling RA, Johnson KA. 2018. Neurogenetic contributions to amyloid beta and tau spreading in

the human cortex. Nature Medicine 24:1910–1918. DOI: https:// doi. org/ 10. 1038/ s41591- 018- 0206- 4, PMID:

30374196

Shafiei G, Markello RD, Vos de Wael R, Bernhardt BC, Fulcher BD, Misic B. 2020. Topographic gradients of

intrinsic dynamics across neocortex. eLife 9:e62116. DOI: https:// doi. org/ 10. 7554/ eLife. 62116, PMID:

33331819

Shafiei G, Bazinet V, Dadar M. 2021. Global Network Structure and Local Transcriptomic Vulnerability Shape

Atrophy in Sporadic and Genetic Behavioral Variant Frontotemporal Dementia. [bioRxiv]. DOI: https:// doi. org/

10. 1101/ 2021. 08. 24. 457538

Shin J, French L, Xu T, Leonard G, Perron M, Pike GB, Richer L, Veillette S, Pausova Z, Paus T. 2018. Cell- Specific

Gene- Expression Profiles and Cortical Thickness in the Human Brain. Cerebral Cortex 28:3267–3277. DOI:

https:// doi. org/ 10. 1093/ cercor/ bhx197, PMID: 28968835

Shine JM, Breakspear M, Bell PT, Ehgoetz Martens KA, Shine R, Koyejo O, Sporns O, Poldrack RA. 2019. Human

cognition involves the dynamic integration of neural activity and neuromodulatory systems. Nature

Neuroscience 22:289–296. DOI: https:// doi. org/ 10. 1038/ s41593- 018- 0312- 0, PMID: 30664771

Simmons JP, Nelson LD, Simonsohn U. 2011. False- positive psychology: undisclosed flexibility in data collection

and analysis allows presenting anything as significant. Psychological Science 22:1359–1366. DOI: https:// doi.

org/ 10. 1177/ 0956797611417632, PMID: 22006061

Sousa AMM, Zhu Y, Raghanti MA, Kitchen RR, Onorati M, Tebbenkamp ATN, Stutz B, Meyer KA, Li M,

Kawasawa YI, Liu F, Perez RG, Mele M, Carvalho T, Skarica M, Gulden FO, Pletikos M, Shibata A,

Stephenson AR, Edler MK, et�al. 2017. Molecular and cellular reorganization of neural circuits in the human

lineage. Science 358:1027–1032. DOI: https:// doi. org/ 10. 1126/ science. aan3456, PMID: 29170230

Steegen S, Tuerlinckx F, Gelman A, Vanpaemel W. 2016. Increasing Transparency Through a Multiverse Analysis.

Perspectives on Psychological Science 11:702–712. DOI: https:// doi. org/ 10. 1177/ 1745691616658637, PMID:

27694465

Thirion B, Varoquaux G, Dohmatob E, Poline JB. 2014. Which fMRI clustering gives good brain parcellations?

Frontiers in Neuroscience 8:167. DOI: https:// doi. org/ 10. 3389/ fnins. 2014. 00167, PMID: 25071425

Thompson WH, Wright J, Bissett PG, Poldrack RA. 2020. Dataset decay and the problem of sequential analyses

on open datasets. eLife 9:e53498. DOI: https:// doi. org/ 10. 7554/ eLife. 53498, PMID: 32425159

Valk SL, Kanske P, Park B -y, Hong SJ, Boeckler- Raettig A, Trautwein FM, Bernhardt BC, Singer T. 2021.

Functional Network Plasticity of the Human Social Brain. [bioRxiv]. DOI: https:// doi. org/ 10. 1101/ 2020. 11. 11.

377895

van der Walt S, Colbert SC, Varoquaux G. 2011. The NumPy Array: A Structure for Efficient Numerical

Computation. Computing in Science & Engineering 13:22–30. DOI: https:// doi. org/ 10. 1109/ MCSE. 2011. 37

Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K, WU- Minn HCP Consortium. 2013. The

WU- Minn Human Connectome Project: an overview. NeuroImage 80:62–79. DOI: https:// doi. org/ 10. 1016/ j.

neuroimage. 2013. 05. 041, PMID: 23684880

Page 27

Tools and resources

Neuroscience

Markello et�al. eLife 2021;10:e72129. DOI: https:// doi. org/ 10. 7554/ eLife. 72129

27 of 27

V�rtes PE, Rittman T, Whitaker KJ, Romero- Garcia R, V�ša F, Kitzbichler MG, Wagstyl K, Fonagy P, Dolan RJ,

Jones PB, Goodyer IM, NSPN Consortium, Bullmore ET. 2016. Gene transcription profiles associated with

inter- modular hubs and connection distance in human functional magnetic resonance imaging networks.

Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 371:20150362. DOI:

https:// doi. org/ 10. 1098/ rstb. 2015. 0362, PMID: 27574314

Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P,

Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E,

Kern R, Larson E, Carey CJ, et�al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python.

Nature Methods 17:261–272. DOI: https:// doi. org/ 10. 1038/ s41592- 019- 0686- 2, PMID: 32015543

Vogel JW, Iturria- Medina Y, Strandberg OT, Smith R, Levitis E, Evans AC, Hansson O, Alzheimer’s Disease

Neuroimaging Initiative, Swedish BioFinder Study. 2020. Spread of pathological tau proteins through

communicating neurons in human Alzheimer’s disease. Nature Communications 11:2612. DOI: https:// doi. org/

10. 1038/ s41467- 020- 15701- 2, PMID: 32457389

Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FCP, Clarke D, Gu M, Emani P, Yang YT, Xu M, Gandal MJ, Lou S,

Zhang J, Park JJ, Yan C, Rhie SK, Manakongtreecheep K, Zhou H, Nathan A, et�al. 2018. Comprehensive

functional genomic resource and integrative model for the human brain. Science 362:eaat8464. DOI: https://

doi. org/ 10. 1126/ science. aat8464, PMID: 30545857

Waskom M, Botvinnik O, OḰane D, Hobson P, Ostblom J, Lukauskas S, Gemperline DC, Augspurger T,

Halchenko Y, Cole JB. 2018. Mwaskom/Seaborn [Zenodo]. DOI: https:// doi. org/ 10. 5281/ zenodo. 592845

Waskom M, Larson E, Brodbeck C, Gramfort A, Burns S, Luessi M, Weidemann CT, Bitzer S, Markiewicz C,

LaPlante R, Halchenko Y, Engemann DA, van Vliet M, Ghosh S, Klein N, Piantoni G, Brett M, Gwilliams L,

King JR, Liu D. 2020. nipy/pysurfer [Zenodo]. DOI: https:// doi. org/ 10. 5281/ zenodo. 592515

Whitaker KJ, V�rtes PE, Romero- Garcia R, V�ša F, Moutoussis M, Prabhu G, Weiskopf N, Callaghan MF,

Wagstyl K, Rittman T, Tait R, Ooi C, Suckling J, Inkster B, Fonagy P, Dolan RJ, Jones PB, Goodyer IM, NSPN

Consortium, Bullmore ET. 2016. Adolescence is associated with genomically patterned consolidation of the

hubs of the human brain connectome. PNAS 113:9105–9110. DOI: https:// doi. org/ 10. 1073/ pnas. 1601745113,

PMID: 27457931

Yao Z, van Velthoven CTJ, Nguyen TN, Goldy J, Sedeno- Cortes AE, Baftizadeh F, Bertagnolli D, Casper T,

Chiang M, Crichton K, Ding S- L, Fong O, Garren E, Glandon A, Gouwens NW, Gray J, Graybuck LT,

Hawrylycz MJ, Hirschstein D, Kroll M, et�al. 2021. A taxonomy of transcriptomic cell types across the isocortex

and hippocampal formation. Cell 184:3222-3241.. DOI: https:// doi. org/ 10. 1016/ j. cell. 2021. 04. 021, PMID:

34004146

Zhao K, Zheng Q, Che T, Martin D, Li Q, Ding Y, Zheng Y, Liu Y, Li S. 2020. Regional Radiomics Similarity

Networks (R2SN) in the Human Brain: Reproducibility, Small- World and Biological Basis. [bioRxiv]. DOI: https://

doi. org/ 10. 1101/ 2020. 12. 09. 418509

Zheng Y- Q, Zhang Y, Yau Y, Zeighami Y, Larcher K, Misic B, Dagher A, Kennedy H. 2019. Local vulnerability and

global connectivity jointly shape neurodegenerative disease propagation. PLOS Biology 17:e3000495. DOI:

https:// doi. org/ 10. 1371/ journal. pbio. 3000495