Abstract
DNA microarray data preprocessing is of utmost importance in the analytical path starting from the experimental design and leading to a reliable biological interpretation. In fact, when all relevant aspects regarding the experimental plan have been considered, the following steps from data quality check to differential analysis will lead to robust, trustworthy results. In this chapter, all the relevant aspects and considerations about microarray preprocessing will be discussed. Preprocessing steps are organized in an orderly manner, from experimental design to quality check and batch effect removal, including the most common visualization methods. Furthermore, we will discuss data representation and differential testing methods with a focus on the most common microarray technologies, such as gene expression and DNA methylation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Marwah VS, Scala G, Kinaret PAS et al (2019) eUTOPIA: solUTion for Omics data preprocessing and analysis. Source Code Biol Med 14:1. https://doi.org/10.1186/s13029-019-0071-7
Rudy J, Valafar F (2011) Empirical comparison of cross-platform normalization methods for gene expression data. BMC Bioinformatics 12:467. https://doi.org/10.1186/1471-2105-12-467
Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32(Suppl):496–501. https://doi.org/10.1038/ng1032
Stoughton RB (2005) Applications of DNA microarrays in biology. Annu Rev Biochem 74:53–82. https://doi.org/10.1146/annurev.biochem.74.082803.133212
Risso D, Ngai J, Speed TP, Dudoit S (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32:896–902. https://doi.org/10.1038/nbt.2931
Tumor Analysis Best Practices Working Group (2004) Expression profiling—best practices for data generation and interpretation in clinical trials. Nat Rev Genet 5:229–237. https://doi.org/10.1038/nrg1297
Wilkes T, Laux H, Foy CA (2007) Microarray data quality—review of current developments. OMICS 11:1–13. https://doi.org/10.1089/omi.2006.0001
Raman T, O’Connor TP, Hackett NR et al (2009) Quality control in microarray assessment of gene expression in human airway epithelium. BMC Genomics 10:493. https://doi.org/10.1186/1471-2164-10-493
Lee E-K, Park T (2007) Exploratory methods for checking quality of microarray data. Bioinformation 1:423–428. https://doi.org/10.6026/97320630001423
Eijssen LMT, Jaillard M, Adriaens ME et al (2013) User-friendly solutions for microarray quality control and pre-processing on ArrayAnalysis.org. Nucleic Acids Res 41:W71–W76. https://doi.org/10.1093/nar/gkt293
Kauffmann A, Gentleman R, Huber W (2009) arrayQualityMetrics—a bioconductor package for quality assessment of microarray data. Bioinformatics 25:415–416. https://doi.org/10.1093/bioinformatics/btn647
Aryee MJ, Jaffe AE, Corrada-Bravo H et al (2014) Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30:1363–1369. https://doi.org/10.1093/bioinformatics/btu049
Federico A, Serra A, Ha MK et al (2020) Transcriptomics in toxicogenomics, part II: preprocessing and differential expression analysis for high quality data. Nanomaterials 10(5):903. https://doi.org/10.3390/nano10050903
Du P, Kibbe WA, Lin SM (2008) lumi: a pipeline for processing Illumina microarray. Bioinformatics 24:1547–1548. https://doi.org/10.1093/bioinformatics/btn224
Chen Y, Lemire M, Choufani S et al (2013) Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 8:203–209. https://doi.org/10.4161/epi.23470
Uva P, de Rinaldis E (2008) CrossHybDetector: detection of cross-hybridization events in DNA microarray experiments. BMC Bioinformatics 9:485. https://doi.org/10.1186/1471-2105-9-485
Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11:1–21. https://doi.org/10.1080/00401706.1969.10490657
Dean RB, Dixon WJ (1951) Simplified statistics for small numbers of observations. Anal Chem 23:636–638. https://doi.org/10.1021/ac60052a025
Faisal S, Tutz G (2017) Missing value imputation for gene expression data by tailored nearest neighbors. Stat Appl Genet Mol Biol 16:95–106. https://doi.org/10.1515/sagmb-2015-0098
Lena PD, Sala C, Prodi A, Nardini C (2020) Methylation data imputation performances under different representations and missingness patterns. BMC Bioinformatics 21:268. https://doi.org/10.1186/s12859-020-03592-5
Park T, Yi S-G, Kang S-H et al (2003) Evaluation of normalization methods for microarray data. BMC Bioinformatics 4:33. https://doi.org/10.1186/1471-2105-4-33
Smyth GK, Speed T (2003) Normalization of cDNA microarray data. Methods 31:265–273. https://doi.org/10.1016/s1046-2023(03)00155-5
Bilban M, Buehler LK, Head S et al (2002) Normalizing DNA microarray data. Curr Issues Mol Biol 4:57–64
Marton MJ, DeRisi JL, Bennett HA et al (1998) Drug target validation and identification of secondary drug target effects using DNA microarrays. Nat Med 4:1293–1301. https://doi.org/10.1038/3282
Alizadeh AA, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511. https://doi.org/10.1038/35000501
Ross DT, Scherf U, Eisen MB et al (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227–235. https://doi.org/10.1038/73432
Yue H, Eastman PS, Wang BB et al (2001) An evaluation of the performance of cDNA microarrays for detecting changes in global mRNA expression. Nucleic Acids Res 29:E41–E41. https://doi.org/10.1093/nar/29.8.e41
Tseng GC, Oh MK, Rohlin L et al (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res 29:2549–2557. https://doi.org/10.1093/nar/29.12.2549
Berger JA, Hautaniemi S, Järvinen A-K et al (2004) Optimized LOWESS normalization parameter selection for DNA microarray data. BMC Bioinformatics 5:194. https://doi.org/10.1186/1471-2105-5-194
Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74:829–836. https://doi.org/10.1080/01621459.1979.10481038
Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193. https://doi.org/10.1093/bioinformatics/19.2.185
Maksimovic J, Gordon L, Oshlack A (2012) SWAN: subset-quantile within array normalization for illumina infinium HumanMethylation450 BeadChips. Genome Biol 13:R44. https://doi.org/10.1186/gb-2012-13-6-r44
Teschendorff AE, Marabita F, Lechner M et al (2013) A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 29:189–196. https://doi.org/10.1093/bioinformatics/bts680
Triche TJ, Weisenberger DJ, Van Den Berg D et al (2013) Low-level processing of Illumina Infinium DNA Methylation BeadArrays. Nucleic Acids Res 41:e90. https://doi.org/10.1093/nar/gkt090
Niu L, Xu Z, Taylor JA (2016) RCP: a novel probe design bias correction method for Illumina Methylation BeadChip. Bioinformatics 32:2659–2663. https://doi.org/10.1093/bioinformatics/btw285
Fortin J-P, Labbe A, Lemire M et al (2014) Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol 15:503. https://doi.org/10.1186/s13059-014-0503-2
Pidsley R, CYC W, Volta M et al (2013) A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 14:293. https://doi.org/10.1186/1471-2164-14-293
Cheadle C, Vawter MP, Freed WJ, Becker KG (2003) Analysis of microarray data using Z score transformation. J Mol Diagn 5:73–81. https://doi.org/10.1016/S1525-1578(10)60455-2
Qiu X, Wu H, Hu R (2013) The impact of quantile and rank normalization procedures on the testing power of gene differential expression analysis. BMC Bioinformatics 14:124. https://doi.org/10.1186/1471-2105-14-124
Leek JT, Scharpf RB, Bravo HC et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739. https://doi.org/10.1038/nrg2825
Leek JT, Johnson WE, Parker HS et al (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883. https://doi.org/10.1093/bioinformatics/bts034
Espín-Pérez A, Portier C, Chadeau-Hyam M et al (2018) Comparison of statistical methods and the use of quality control samples for batch effect correction in human transcriptome data. PLoS One 13:e0202947. https://doi.org/10.1371/journal.pone.0202947
Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8:118–127. https://doi.org/10.1093/biostatistics/kxj037
Chen C, Grennan K, Badner J et al (2011) Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One 6:e17238. https://doi.org/10.1371/journal.pone.0017238
Pagès H, Carlson M, Falcon S, Li N (2020) AnnotationDbi: manipulation of SQLite-based annotations in bioconductor. R package version 1.52.0. https://bioconductor.org/packages/AnnotationDbi
Hansen KD (2016) IlluminaHumanMethylationEPICanno.ilm10b2.hg19: annotation for Illumina’s EPIC methylation arrays. R package version 0.6.0. https://bitbucket.com/kasperdanielhansen/Illumina_EPIC
Babu MM (2004) Introduction to microarray data analysis. In: Grant RP (ed) Computational genomics: theory and application. Taylor & Francis
Du P, Zhang X, Huang C-C et al (2010) Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11:587. https://doi.org/10.1186/1471-2105-11-587
Weinhold L, Wahl S, Pechlivanis S et al (2016) A statistical model for the analysis of beta values in DNA methylation studies. BMC Bioinformatics 17:480. https://doi.org/10.1186/s12859-016-1347-4
Ritchie ME, Phipson B, Wu D et al (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47. https://doi.org/10.1093/nar/gkv007
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Federico, A. et al. (2022). Microarray Data Preprocessing: From Experimental Design to Differential Analysis. In: Agapito, G. (eds) Microarray Data Analysis. Methods in Molecular Biology, vol 2401. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1839-4_7
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1839-4_7
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1838-7
Online ISBN: 978-1-0716-1839-4
eBook Packages: Springer Protocols