Cross Validated Community Digest

Top new questions this week:

Examples of distributions with easily solvable quantile functions but hard to solve CDFs

I'm interested in examples of probability distributions where the quantile function $F^{-1}(p)$ exists in closed form or is easy to calculate but where the cumulative distribution function (CDF) $F(x)$...

quantiles cumulative-distribution-function  
user avatar asked by spencergw Score of 5
user avatar answered by Glen_b Score of 8

I think standard deviation of y is related to size of x. How do I create a model for this / test this?

I have a sample of data $(x_i, y_i)$. I hypothesize that $y_i$ is not dependent on $x_i$, but the standard deviation of $y_i$ depends on $x_i$ More concretely, say I assume $\textrm{Var}(y_i | x_i) = ...

regression data-visualization standard-deviation heteroscedasticity  
user avatar asked by Lost1 Score of 5
user avatar answered by Frans Rodenburg Score of 6

Standard negative binomial regression when counts are mainly zeros?

This question must have been asked many times before but I can't find an answer. I'm getting very confused about when to use a zero-inflated negative binomial regression vs standard negative binomial ...

regression count-data negative-binomial-distribution zero-inflation structural-zero  
user avatar asked by Picapica Score of 5
user avatar answered by Stephan Kolassa Score of 9

Is it statistically appropriate to aggregate indexes?

I would like to know if aggregating indexes is statistically appropriate. Specifically, I would like to aggregate the CDC’s social vulnerability index (SVI): https://www.atsdr.cdc.gov/placeandhealth/...

aggregation  
user avatar asked by KKMK Score of 5
user avatar answered by Ben Score of 5

I have count data, but it does not follow a Poisson distribution, what to do?

My data consists of 3 variables, one is a numerical variable of the number of flower visits that I have counted on certain locations and on certain shrub species. My other 2 variables are categorical ...

chi-squared-test poisson-distribution  
user avatar asked by Geertje Score of 4
user avatar answered by Nick Cox Score of 2

Welch t-test p-values are poorly calibrated for $N=2$ samples

I am performing a large number of Welch's t-tests (t-test with unequal variance) on very small sample sizes, often with only two samples per condition. I am finding the p-values are poorly calibrated: ...

t-test p-value calibration  
user avatar asked by emarti Score of 4
user avatar answered by Dave Score of 6

Statistical Inference on Samples vs Populations

I am looking at this question Statistical inference when the sample "is" the population where the following analysis is given: User 1: "Treat the population data as a sample and assume ...

hypothesis-testing  
user avatar asked by hungryhungryhypothesis Score of 4
user avatar answered by Christian Hennig Score of 1

Greatest hits from previous weeks:

What is the meaning of p values and t values in statistical tests?

After taking a statistics course and then trying to help fellow students, I noticed one subject that inspires much head-desk banging is interpreting the results of statistical hypothesis tests. It ...

hypothesis-testing p-value interpretation intuition faq  
user avatar asked by Sharpie Score of 299
user avatar answered by whuber Score of 163

How to reverse PCA and reconstruct original variables from several principal components?

Principal component analysis (PCA) can be used for dimensionality reduction. After such dimensionality reduction is performed, how can one approximately reconstruct the original variables/features ...

pca dimensionality-reduction svd  
user avatar asked by amoeba Score of 175
user avatar answered by amoeba Score of 241

How to determine which variable goes on the X & Y axes in a scatterplot?

I am trying to do a scatterplot to see the relationship between literacy and baby mortality. How do I know if literacy is my X axis and baby mortality is my Y axis, or the reverse? How do I ...

data-visualization scatterplot  
user avatar asked by Beth Score of 10
user avatar answered by Glen_b Score of 17

Softmax vs Sigmoid function in Logistic classifier?

What decides the choice of function ( Softmax vs Sigmoid ) in a Logistic classifier ? Suppose there are 4 output classes . Each of the above function gives the probabilities of each class being the ...

machine-learning logistic classification softmax  
user avatar asked by mach Score of 132
user avatar answered by Franck Dernoncourt Score of 176

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

I am playing with convolutional neural networks using Keras+Tensorflow to classify categorical data. I have a choice of two loss functions: categorial_crossentropy ...

machine-learning conv-neural-network loss-functions information-theory cross-entropy  
user avatar asked by kedarps Score of 90
user avatar answered by skadaver Score of 107

Correlations with unordered categorical variables

I have a dataframe with many observations and many variables. Some of them are categorical (unordered) and the others are numerical. I'm looking for associations between these variables. I've been ...

r correlation categorical-data continuous-data mixed-type-data  
user avatar asked by Clément F Score of 149
user avatar answered by gung - Reinstate Monica Score of 135

How to normalize data between -1 and 1?

I have seen the min-max normalization formula but that normalizes values between 0 and 1. How would I normalize my data between -1 and 1? I have both negative and positive values in my data matrix.

dataset normalization  
user avatar asked by covfefe Score of 94
user avatar answered by Simone Score of 197

Can you answer these questions?

Sample Size: Cluster Randomized Trials

I seek a peer check on my approach in calculating the sample size for pair-matched or stratified cluster randomized trials assuming 80% power. Some background on the study: I am working with ...

r sample-size epidemiology cluster-sample  
user avatar asked by Tavaro Evanis Score of 1

Rescale measures of association for meta-analysis (e.g., log-transformed independent variables)

I am carrying out a meta-analysis of studies evaluating the association between blood levels of specific environmental pollutants and health outcomes (binary). Some studies reported OR/RR/HR for ...

logistic data-transformation meta-analysis logarithm  
user avatar asked by msas Score of 1

seeking suggestions for visualizing relationships between extracted topics and ratings in R

I have a set of product online reviews data. The dataset contains review text and 1-5 star ratings. I've extracted 5 prevalence topics using R stm package. They are price, design, packaging, promotion ...

r data-visualization  
user avatar asked by James Score of 1
You're receiving this message because you subscribed to the Cross Validated community digest.
Unsubscribe from this community digest       Edit email settings       Leave feedback       Privacy
Stack Overflow

Stack Overflow, 14 Wall Street, 20th Floor, New York, NY 10005

<3