Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction

X Xia�- Scientifica, 2012 - Wiley Online Library
Scientifica, 2012Wiley Online Library
Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods,
but also a key component in more advanced computational algorithms (eg, Gibbs sampler)
for characterizing and discovering motifs in nucleotide or amino acid sequences. However,
few generally applicable statistical tests are available for evaluating the significance of site
patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of
the PWM output, that is, site‐specific frequencies, PWM itself, and PWMS, are in disparate�…
Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site‐specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM‐based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site‐specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution.
Wiley Online Library
Showing the best result for this search. See all results