Tags - Cross Validated

regression

Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

30131 questions

r

for any *on-topic* question that (a) involves `R` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `R`.

30007 questions

30 asked this week, 116 this month

machine-learning

Machine learning algorithms build a model of the training data. The term "machine learning" is vaguely defined; it includes what is also called statistical learning, reinforcement learning, unsupervis…

20192 questions

11 asked this week, 48 this month

time-series

Time series are data observed over time (either in continuous time or at discrete time periods).

14381 questions

11 asked this week, 53 this month

probability

A probability provides a quantitative description of the likely occurrence of a particular event.

12691 questions

8 asked this week, 33 this month

hypothesis-testing

Hypothesis testing assesses whether data are inconsistent with a given hypothesis rather than being an effect of random fluctuations.

10862 questions

16 asked this week, 63 this month

distributions

A distribution is a mathematical description of probabilities or frequencies.

9643 questions

12 asked this week, 30 this month

self-study

A routine exercise designed to test one's knowledge; often from a textbook, course, or test used for a class or self-study. This community's policy is to "provide helpful hints" for such questions rat…

8149 questions

5 asked this month, 205 this year

neural-networks

Artificial neural networks (ANNs) are a broad class of computational models loosely based on biological neural networks. They encompass feedforward NNs (including "deep" NNs), convolutional NNs, recu…

8100 questions

7 asked this week, 29 this month

bayesian

Bayesian inference is a method of statistical inference that relies on treating the model parameters as random variables and applying Bayes' theorem to deduce subjective probability statements about t…

8086 questions

7 asked this week, 30 this month

logistic

Refers generally to statistical procedures that utilize the logistic function, most commonly various forms of logistic regression

7927 questions

7 asked this week, 25 this month

mathematical-statistics

Mathematical theory of statistics, concerned with formal definitions and general results.

7827 questions

7 asked this week, 36 this month

classification

Statistical classification is the problem of identifying the sub-population to which new observations belong, where the identity of the sub-population is unknown, on the basis of a training set of dat…

6914 questions

6 asked this week, 22 this month

mixed-model

Mixed (aka multilevel or hierarchical) models are linear models that include both fixed effects and random effects. They are used to model longitudinal or nested data.

6510 questions

11 asked this week, 44 this month

statistical-significance

Statistical significance is a characteristic of a statistic viewed in light of a null hypothesis and a given significance level. It reflects whether the statistic belongs to the rejection region (is s…

6453 questions

9 asked this week, 24 this month

correlation

A measure of the degree of association among a pair of variables.

6403 questions

10 asked this week, 42 this month

normal-distribution

The normal, or Gaussian, distribution has a density function that is a symmetrical bell-shaped curve. It is one of the most important distributions in statistics. Use the [normality] tag for asking ab…

6128 questions

9 asked this month, 253 this year

multiple-regression

Regression that includes two or more non-constant independent variables.

5609 questions

8 asked this month, 367 this year

anova

ANOVA stands for ANalysis Of VAriance, a statistical model and set of procedures for comparing multiple group means. The independent variables in an ANOVA model are categorical, but an ANOVA table can…

5372 questions

5 asked this week, 19 this month

python

Python is a programming language commonly used for machine learning. Use this tag for any *on-topic* question that (a) involves `Python` either as a critical part of the question or expected answer, &…

4818 questions

7 asked this week, 19 this month

generalized-linear-model

A generalization of linear regression allowing for nonlinear relationships via a "link function" and for the variance of the response to depend on the predicted value. (Not to be confused with "genera…

4658 questions

5 asked this week, 23 this month

confidence-interval

A confidence interval is an interval that covers an unknown parameter with $100(1-\alpha)\%$ confidence. Confidence intervals are a frequentist concept. They are often confused with credible intervals…

4641 questions

17 asked this month, 273 this year

variance

The expected squared deviation of a random variable from its mean; or, the average squared deviation of data about their mean.

4269 questions

9 asked this month, 218 this year

clustering

Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors a…

4038 questions

151 asked this year

forecasting

Prediction of the future events. It is a special case of [prediction], in the context of [time-series].

3904 questions

5 asked this week, 12 this month

t-test

A test for comparing the means of two samples, or the mean of one sample (or even parameter estimates) with a specified value; also known as the "Student t-test" after the pseudonym of its inventor.

3700 questions

10 asked this month, 203 this year

categorical-data

Categorical (also called nominal) data can take on a limited number of possible values called categories. Categorical values "label", they do not "measure". Please use [ordinal-data] tag for discrete …

3590 questions

12 asked this month, 146 this year

lme4-nlme

lme4 and nlme are R packages used for fitting linear, generalized linear and nonlinear mixed effects models. For general questions about mixed models use [mixed-model] tag.

3501 questions

8 asked this week, 18 this month

cross-validation

Repeatedly withholding subsets of the data during model fitting in order to quantify the model performance on the withheld data subsets.

3491 questions

11 asked this month, 121 this year

pca

Principal component analysis (PCA) is a linear dimensionality reduction technique. It reduces a multivariate dataset to a smaller set of constructed variables preserving as much information (as much v…

3447 questions

10 asked this month, 156 this year

maximum-likelihood

a method of estimating parameters of a statistical model by choosing the parameter value that optimizes the probability of observing the given sample.

3418 questions

22 asked this month, 210 this year

survival

Survival analysis models time to event data, typically time to death or failure time. Censored data are a common problem for survival analyses.

3383 questions

10 asked this week, 36 this month

sampling

Creating samples from a well-specified population using a probabilistic method and/or producing random numbers from a specified distribution. As this tag is ambiguous, please consider [survey-sampling…

3280 questions

9 asked this month, 189 this year

estimation

too general; please provide a more specific tag. For questions about the properties of specific estimators, use [estimators] tag instead.

3266 questions

11 asked this month, 103 this year

predictive-models

Predictive models are statistical models whose primary purpose is to predict other observations of a system optimally, as opposed to models whose purpose is to test a particular hypothesis or explain …

3188 questions

17 asked this month, 239 this year

data-visualization

Constructing and interpreting meaningful and useful graphical representations of data. (If your question is only about how to get particular software to produce a specific effect, then it is likely no…

3095 questions

8 asked this month, 123 this year