Skip to main content

Microarray Data Preprocessing: From Experimental Design to Differential Analysis

  • Protocol
  • First Online:
Microarray Data Analysis

Abstract

DNA microarray data preprocessing is of utmost importance in the analytical path starting from the experimental design and leading to a reliable biological interpretation. In fact, when all relevant aspects regarding the experimental plan have been considered, the following steps from data quality check to differential analysis will lead to robust, trustworthy results. In this chapter, all the relevant aspects and considerations about microarray preprocessing will be discussed. Preprocessing steps are organized in an orderly manner, from experimental design to quality check and batch effect removal, including the most common visualization methods. Furthermore, we will discuss data representation and differential testing methods with a focus on the most common microarray technologies, such as gene expression and DNA methylation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
eBook
USD 109.00
Price excludes VAT (USA)
Softcover Book
USD 139.99
Price excludes VAT (USA)
Hardcover Book
USD 219.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Marwah VS, Scala G, Kinaret PAS et al (2019) eUTOPIA: solUTion for Omics data preprocessing and analysis. Source Code Biol Med 14:1. https://doi.org/10.1186/s13029-019-0071-7

    Article  PubMed  PubMed Central  Google Scholar 

  2. Rudy J, Valafar F (2011) Empirical comparison of cross-platform normalization methods for gene expression data. BMC Bioinformatics 12:467. https://doi.org/10.1186/1471-2105-12-467

    Article  PubMed  PubMed Central  Google Scholar 

  3. Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32(Suppl):496–501. https://doi.org/10.1038/ng1032

    Article  CAS  PubMed  Google Scholar 

  4. Stoughton RB (2005) Applications of DNA microarrays in biology. Annu Rev Biochem 74:53–82. https://doi.org/10.1146/annurev.biochem.74.082803.133212

    Article  CAS  PubMed  Google Scholar 

  5. Risso D, Ngai J, Speed TP, Dudoit S (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32:896–902. https://doi.org/10.1038/nbt.2931

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Tumor Analysis Best Practices Working Group (2004) Expression profiling—best practices for data generation and interpretation in clinical trials. Nat Rev Genet 5:229–237. https://doi.org/10.1038/nrg1297

    Article  CAS  Google Scholar 

  7. Wilkes T, Laux H, Foy CA (2007) Microarray data quality—review of current developments. OMICS 11:1–13. https://doi.org/10.1089/omi.2006.0001

    Article  CAS  PubMed  Google Scholar 

  8. Raman T, O’Connor TP, Hackett NR et al (2009) Quality control in microarray assessment of gene expression in human airway epithelium. BMC Genomics 10:493. https://doi.org/10.1186/1471-2164-10-493

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Lee E-K, Park T (2007) Exploratory methods for checking quality of microarray data. Bioinformation 1:423–428. https://doi.org/10.6026/97320630001423

    Article  PubMed  PubMed Central  Google Scholar 

  10. Eijssen LMT, Jaillard M, Adriaens ME et al (2013) User-friendly solutions for microarray quality control and pre-processing on ArrayAnalysis.org. Nucleic Acids Res 41:W71–W76. https://doi.org/10.1093/nar/gkt293

    Article  PubMed  PubMed Central  Google Scholar 

  11. Kauffmann A, Gentleman R, Huber W (2009) arrayQualityMetrics—a bioconductor package for quality assessment of microarray data. Bioinformatics 25:415–416. https://doi.org/10.1093/bioinformatics/btn647

    Article  CAS  PubMed  Google Scholar 

  12. Aryee MJ, Jaffe AE, Corrada-Bravo H et al (2014) Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30:1363–1369. https://doi.org/10.1093/bioinformatics/btu049

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Federico A, Serra A, Ha MK et al (2020) Transcriptomics in toxicogenomics, part II: preprocessing and differential expression analysis for high quality data. Nanomaterials 10(5):903. https://doi.org/10.3390/nano10050903

    Article  CAS  PubMed Central  Google Scholar 

  14. Du P, Kibbe WA, Lin SM (2008) lumi: a pipeline for processing Illumina microarray. Bioinformatics 24:1547–1548. https://doi.org/10.1093/bioinformatics/btn224

    Article  CAS  PubMed  Google Scholar 

  15. Chen Y, Lemire M, Choufani S et al (2013) Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 8:203–209. https://doi.org/10.4161/epi.23470

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Uva P, de Rinaldis E (2008) CrossHybDetector: detection of cross-hybridization events in DNA microarray experiments. BMC Bioinformatics 9:485. https://doi.org/10.1186/1471-2105-9-485

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11:1–21. https://doi.org/10.1080/00401706.1969.10490657

    Article  Google Scholar 

  18. Dean RB, Dixon WJ (1951) Simplified statistics for small numbers of observations. Anal Chem 23:636–638. https://doi.org/10.1021/ac60052a025

    Article  CAS  Google Scholar 

  19. Faisal S, Tutz G (2017) Missing value imputation for gene expression data by tailored nearest neighbors. Stat Appl Genet Mol Biol 16:95–106. https://doi.org/10.1515/sagmb-2015-0098

    Article  CAS  PubMed  Google Scholar 

  20. Lena PD, Sala C, Prodi A, Nardini C (2020) Methylation data imputation performances under different representations and missingness patterns. BMC Bioinformatics 21:268. https://doi.org/10.1186/s12859-020-03592-5

    Article  PubMed  PubMed Central  Google Scholar 

  21. Park T, Yi S-G, Kang S-H et al (2003) Evaluation of normalization methods for microarray data. BMC Bioinformatics 4:33. https://doi.org/10.1186/1471-2105-4-33

    Article  PubMed  PubMed Central  Google Scholar 

  22. Smyth GK, Speed T (2003) Normalization of cDNA microarray data. Methods 31:265–273. https://doi.org/10.1016/s1046-2023(03)00155-5

    Article  CAS  PubMed  Google Scholar 

  23. Bilban M, Buehler LK, Head S et al (2002) Normalizing DNA microarray data. Curr Issues Mol Biol 4:57–64

    CAS  PubMed  Google Scholar 

  24. Marton MJ, DeRisi JL, Bennett HA et al (1998) Drug target validation and identification of secondary drug target effects using DNA microarrays. Nat Med 4:1293–1301. https://doi.org/10.1038/3282

    Article  CAS  PubMed  Google Scholar 

  25. Alizadeh AA, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511. https://doi.org/10.1038/35000501

    Article  CAS  PubMed  Google Scholar 

  26. Ross DT, Scherf U, Eisen MB et al (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227–235. https://doi.org/10.1038/73432

    Article  CAS  PubMed  Google Scholar 

  27. Yue H, Eastman PS, Wang BB et al (2001) An evaluation of the performance of cDNA microarrays for detecting changes in global mRNA expression. Nucleic Acids Res 29:E41–E41. https://doi.org/10.1093/nar/29.8.e41

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Tseng GC, Oh MK, Rohlin L et al (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res 29:2549–2557. https://doi.org/10.1093/nar/29.12.2549

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Berger JA, Hautaniemi S, Järvinen A-K et al (2004) Optimized LOWESS normalization parameter selection for DNA microarray data. BMC Bioinformatics 5:194. https://doi.org/10.1186/1471-2105-5-194

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74:829–836. https://doi.org/10.1080/01621459.1979.10481038

    Article  Google Scholar 

  31. Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193. https://doi.org/10.1093/bioinformatics/19.2.185

    Article  CAS  PubMed  Google Scholar 

  32. Maksimovic J, Gordon L, Oshlack A (2012) SWAN: subset-quantile within array normalization for illumina infinium HumanMethylation450 BeadChips. Genome Biol 13:R44. https://doi.org/10.1186/gb-2012-13-6-r44

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Teschendorff AE, Marabita F, Lechner M et al (2013) A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 29:189–196. https://doi.org/10.1093/bioinformatics/bts680

    Article  CAS  PubMed  Google Scholar 

  34. Triche TJ, Weisenberger DJ, Van Den Berg D et al (2013) Low-level processing of Illumina Infinium DNA Methylation BeadArrays. Nucleic Acids Res 41:e90. https://doi.org/10.1093/nar/gkt090

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Niu L, Xu Z, Taylor JA (2016) RCP: a novel probe design bias correction method for Illumina Methylation BeadChip. Bioinformatics 32:2659–2663. https://doi.org/10.1093/bioinformatics/btw285

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Fortin J-P, Labbe A, Lemire M et al (2014) Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol 15:503. https://doi.org/10.1186/s13059-014-0503-2

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Pidsley R, CYC W, Volta M et al (2013) A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 14:293. https://doi.org/10.1186/1471-2164-14-293

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Cheadle C, Vawter MP, Freed WJ, Becker KG (2003) Analysis of microarray data using Z score transformation. J Mol Diagn 5:73–81. https://doi.org/10.1016/S1525-1578(10)60455-2

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Qiu X, Wu H, Hu R (2013) The impact of quantile and rank normalization procedures on the testing power of gene differential expression analysis. BMC Bioinformatics 14:124. https://doi.org/10.1186/1471-2105-14-124

    Article  PubMed  PubMed Central  Google Scholar 

  40. Leek JT, Scharpf RB, Bravo HC et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739. https://doi.org/10.1038/nrg2825

    Article  CAS  PubMed  Google Scholar 

  41. Leek JT, Johnson WE, Parker HS et al (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883. https://doi.org/10.1093/bioinformatics/bts034

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Espín-Pérez A, Portier C, Chadeau-Hyam M et al (2018) Comparison of statistical methods and the use of quality control samples for batch effect correction in human transcriptome data. PLoS One 13:e0202947. https://doi.org/10.1371/journal.pone.0202947

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8:118–127. https://doi.org/10.1093/biostatistics/kxj037

    Article  PubMed  Google Scholar 

  44. Chen C, Grennan K, Badner J et al (2011) Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One 6:e17238. https://doi.org/10.1371/journal.pone.0017238

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Pagès H, Carlson M, Falcon S, Li N (2020) AnnotationDbi: manipulation of SQLite-based annotations in bioconductor. R package version 1.52.0. https://bioconductor.org/packages/AnnotationDbi

  46. Hansen KD (2016) IlluminaHumanMethylationEPICanno.ilm10b2.hg19: annotation for Illumina’s EPIC methylation arrays. R package version 0.6.0. https://bitbucket.com/kasperdanielhansen/Illumina_EPIC

  47. Babu MM (2004) Introduction to microarray data analysis. In: Grant RP (ed) Computational genomics: theory and application. Taylor & Francis

    Google Scholar 

  48. Du P, Zhang X, Huang C-C et al (2010) Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11:587. https://doi.org/10.1186/1471-2105-11-587

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Weinhold L, Wahl S, Pechlivanis S et al (2016) A statistical model for the analysis of beta values in DNA methylation studies. BMC Bioinformatics 17:480. https://doi.org/10.1186/s12859-016-1347-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Ritchie ME, Phipson B, Wu D et al (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47. https://doi.org/10.1093/nar/gkv007

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dario Greco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Federico, A. et al. (2022). Microarray Data Preprocessing: From Experimental Design to Differential Analysis. In: Agapito, G. (eds) Microarray Data Analysis. Methods in Molecular Biology, vol 2401. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1839-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1839-4_7

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1838-7

  • Online ISBN: 978-1-0716-1839-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics