In the present report, we introduced the abagen toolbox, an open- source Python library for processing
transcriptomic data. Using abagen, we conducted a comprehensive analysis examining whether and
how different processing options modify statistical estimates derived from analyses using the AHBA.
We investigated how processing pipelines used in the literature compare to those we tested, and
provide recommendations for improving standardization and reporting of analyses using the AHBA,
highlighting how the abagen toolbox can facilitate future developments in this space.
Testing nearly 750,000 unique processing pipelines, we find that choice of processing parameters
can strongly influence statistical estimates derived from analyses of the AHBA, and that these choices
interact with the type of analysis performed (Figure�1). We observe significant variability with regard
to which parameters are most influential, finding that procedures modifying gene expression normal-
ization have a far greater impact on downstream analyses than other processing steps (Figure� 2).
Looking to the literature, we reproduce nine pipelines from published articles and find that, despite
notable inconsistencies in their processing choices, there is moderate consistency in their produced
statistical estimates (Figure�3). We demonstrate, however, that these summary estimates may obscure
meaningful differences in gene expression values derived by the pipelines, cautioning researchers to
be aware of how analytic choices may impact their findings.
Altogether, the present report provides a comprehensive assessment of how processing variability
can impact analyses in the field of imaging transcriptomics. Our results demonstrate how researcher
choices (or ‘researcher degrees of freedom’; Simmons et�al., 2011) can play a meaningful role in anal-
yses of the AHBA. However, these findings are not necessarily limited to the AHBA. Indeed, increasing
reliance on open- access datasets has begun to reveal unique challenges associated with data reuse
re- using) openly available datasets may help to mitigate some of these challenges. We believe that
functionality in the abagen toolbox can support future researchers in overcoming these pitfalls and
improve reproducibility in processing and analyzing AHBA data.
Our results also show that not all processing choices are equal: that is, we find a hierarchy of
processing parameters, wherein procedures modifying gene normalization have the greatest impact
on analyses, followed by steps more broadly influencing the matching of tissue samples to brain
regions and finally by parameters that determine probe selection. Furthermore, we find that within
processing steps certain parameter choices may lead to more reasonable statistical estimates. In
particular, applying some form of gene normalization tends to improve the behavior of processed
expression data when compared to instances in which no normalization is applied (Figure� 1), but
there appear to be limited differences in the type of normalization used. Although we only considered
cortical tissue samples in the current analyses, we expect that including non- cortical samples would
sion values between cortex and subcortical structures will likely emphasize the impact of different
normalization procedures across pipelines. Critically, these findings largely agree with previous
choices for abagen workflows accordingly.
Note that there are some processing steps that should be performed in a specific sequence, and
others whose order could potentially be interchanged. For example, intensity- based filtering of probes